Introduction

In fact, “semantic parsing” is, ironically, a semantically ambiguous term, which could refer to:

Semantic role labeling (also known as shallow semantic parsing, see previous post for more information).
Finding generic relations in text.
Transforming a natural language sentence into its meaning representation.

In this post, semantic parsing is the task of mapping natural language sentences (NL) into complete formal meaning representation (MRs) which a computer can execute for some domain -specific application ¹. Representing the meaning of natural language is ultimately a difficult philosophical question and many attempts have been made to define generic formal semantics of natural language, including:

Abstract Meaning Representation (AMR)
Semantic Dependency Parsing (SDP)
Meaning Representation Language (MRL)
SQL Query

Abstract Meaning Representation

Abstract Meaning Representation (AMR) ² is a semantic formalism in which the meaning of a sentence is encoded as a rooted, directed, acyclic graph. Nodes represent concepts, and labeled directed graph edges represent the relationships between them. For example, for the sentence “the boy wants to believe the girl” we can have the corresponding AMR graph (in penman format) as:

(w / want-01
	:ARG0 (b / boy)
	:ARG1 (b2 / believe-01
		:ARG0 (g / girl)
		:ARG1 b))

AMR parsing, the task of transforming a sentence into its AMR graph, is a challenging task as it requires the parser to learn to predict not only concepts, which consist of predicates, lemmas, named entities, wiki-links and co-references, but also a large number of relation types based on the semantic role in PropBank (see the previous blog Semantic Role Labeling if you are interested).

Data

There are multiple version of AMR data, including:

LDC2013E117: 10,853 sentences with 13,050 AMRs (corpus includes multiple annotations for some sentences)
LDC2014T12 (AMR 1.0): 13,051 sentences.
LDC2015E86: 19,572 sentences. This version includes more AMRs, wikification, AMR adoption of new unified PropBank frames, AMR deepening, automatic AMR-English alignments, and correction of AMR annotation errors.
LDC2016E25: 39,260 sentences.
LDC2017T10 (AMR 2.0): 39,260 sentences.

Experiment & Results

These are the AMR parsing results on LDC2014T12, LDC2015E86 and LDC2017T10 test sets, where 2014N refers to the newswire section of LDC2014T12. You should look the original paper if you are interested in the model. Please contact me (through email) if the statistics is incorrect.

Paper	Model	LDC2014N	LDC2014	LDC2015	LDC2017
ACL-18	Lyu & Titov University of Edinburgh & University of Amsterdam	-	-	73.7	74.4
ACL-18	Groschwitz et al. Saarland University & Macquerie University	-	-	70.2	71.0
ACL-17	Buys & Blunsom University of Oxford & Deepmind	-	-	-	61.9
ACL-17	Foland & Martin University of Colorado	-	-	70.7	-
EMNLP-18	Guo & Lu Singapore University of Technology & Design	74.0	68.3	68.7	69.8
Arxiv-17	van Noord & Bos University of Groningen	-	-	68.5	71.0
EMNLP-17	Wang & Xue Brandeis University	-	68.1	68.1	-
Semeval-16	Wang et al. (CAMR) Brandeis University & Boulder Language Technologies	-	66.5	67.3	-
Semeval-16	Barzdins & Gosko University of Latvia	-	-	67.2	-
Semeval-16	Flanigan et al. (JAMR) Carnegie Mellon University & University of Washington	-	66	67	-
EACL-17	Damonte et al. University of Edinburgh & University of Padua	-	64	64	-
NAACL-18	Vilares & Gómez-Rodríguez Universidade da Coruña	-	-	64	-
AAAI-18	Peng et al. University of Rochester & University of Padua	-	-	64	-
ACL-17	Konstas et al. University of Washington & Allen Institute for Artificial Intelligence	-	-	62.1	-
EACL-17	Peng et al. University of Rochester & Brandeis University	-	-	52	-
EMNLP-15	Pust et al. Univerisity of Southern California	-	67.1	-	-
EMNLP-16	Zhou et al. Nianjin Normal University & DFKI, Germany	71	66	-	-
Semeval-16	Goodman et al. University College London & University of Sheffeld	70	-	64	-
ACL-IJCNLP-15	Wang et al. (CAMR) Brandeis University & Boulder Language Technologies	70	66	-	-
EMNLP-17	Ballesteros & AI-Onaizan IBM T.J Waston Research Center	69	64	-	-
EMNLP-15	Artzi et al. Cornell University & University of Washington	67	-	-	-
ACL-IJCNLP-15	Werling et al. Stanford University	62	-	-	-
NAACL-15	Wang et al. (CAMR) Brandeis University & Harvard Medical School	-	59	-	-
ACL-14	Flanigan et al. (JAMR) Carnegie Mellon University	59	58	-	-
CoNLL-15	Peng et al. University of Rochester	58	-	-	-

Semantic Dependency Parsing

Task 8 at SemEval 2014 defines Broad-Coverage Semantic Dependency Parsing (SDP) as the problem of recovering sentence-internal predicate-argument relationships for all content words, i.e., the semantic structure constituting the relational core of sentence meaning ³.

SDP seeks to stimulate the dependency parsing community to move towards more general graph processing, to thus enable a more direct analysis of Who did What to Whom? In addition to its relation to syntactic dependency parsing, the task also has some overlap with Semantic Role Labeling (SRL) ⁴.

The SDP parsers are required to identify all semantic dependencies, i.e., compute a representation that integrates all content words in one structure. Different from SRL, SDP task does not encompass predicate disambiguation, a design decision in part owed to SDP’s goal to focus on parsing-oriented, i.e., structural, analysis, and in part to lacking consensus on sense inventories for all content words.

SDP uses three distinct target representations for semantic dependencies. Including:

DM: DELPH-IN MRS-Derived Bi-Lexical Dependencies

These semantic dependency graphs originate in a manual re-annotation of Sections 00–21 of the WSJ Corpus with syntactico-semantic analyses derived from the LinGO English Resource Grammar (ERG) ⁵. The DM target representations are derived through a two-step ‘lossy’ conversion of MRSs, first to variable-free Elementary Dependency Structures (EDS) ⁶, then to ‘pure’ bi-lexical form - projecting some construction semantics onto word-to-word dependencies ⁷. For example, for the sentence “Ms. Haag plays Elianti.”, we have the DM representation as follows:

id	form	lemma	pos	top	pred	arg1	arg2
1	Ms.	Ms.	NNP	-	+	_	_
2	Haag	Haag	NNP	-	-	compound	ARG1
3	plays	play	VBZ	+	+	_	_
4	Elianti	Elianti	NNP	-	-	_	ARG2
5	.	.	.	-	-	_	_

PAS: Enju Predicate-Argument Structures

The Enju parsing system is an HPSG-based parser for English. The grammar and the disambiguation model of this parser are derived from the Enju HPSG treebank, which is automatically converted from the phrase structure and predicate-argument structure annotation of the PTB. The PAS data set is extracted from the WSJ portion of the Enju HPSG treebank. While the Enju treebank is annotated with full HPSG-style structures, only its predicate-argument structures are convert into the SDP data format for use in the SDP task.

PCEDT: Prague Tectogrammatical Bi-Lexical Dependencies

The Prague Czech-English Dependency TreeBank (PCEDT) ⁸ is a set of parallel dependency trees over the WSJ texts from the PTB, and their Czech translations. The specifics of the PCEDT representations are best observed in the procedure that converts the original PCEDT data to the SDP data format; see Miyao et al. 2014 ⁹.

Data

Semeval-2014 Task 8: For this task, there will be three data sets: training, development, and test. On November 4, 2013, a sample (of some 190 dependency graphs) from the training data was published as trial data, demonstrating key characteristics of the task. Since December 13, some 750,000 tokens of annotated text are available as training data; please subscribe to the task mailing list for access information.
Semeval-2015 Task 8: This Task is a re-run with some extensions of Task 8 at SemEval-2014.

Experiment & Results

The results are the labeled parsing performance (F1 score) on both in-domain and out-of-domain test data and are based on the English dataset from SemEval 2015 Task 18 closed track. The data split is 33,964 training sentences from 00-19 of the WSJ corpus, 1,692 development sentences from 20, 1,410 sentences from 21 as in domain test data, and 1,849 sentences sampled from the Brown Corpus as out-of-domain test data.

In-Domain

paper	model	DM	PAS	PSD
ACL-2018	Dozat & Manning Stanford University	93.7	93.9	81.0
ACL-2017	Peng et al. University of Washington & Carnegie Mellon University	90.4	92.7	78.5
AAAI-2018	Wang et al. Harbin Institute of Technology	90.3	91.7	78.6
Semeval-2015	Du et al. Peking University	89.1	91.3	75.7
Semeval-2015	Almeida & Martins Priberam Labs & Instituto de Telecomunicacoes	88.2	90.9	76.4

Out-Of-Domain

paper	model	DM	PAS	PSD
ACL-2018	Dozat & Manning Stanford University	88.9	90.6	79.4
ACL-2017	Peng et al. University of Washington & Carnegie Mellon University	85.3	89.0	76.4
AAAI-2018	Wang et al. Harbin Institute of Technology	84.9	87.6	75.9
Semeval-2015	Du et al. Peking University	81.8	87.2	73.3
Semeval-2015	Almeida & Martins Priberam Labs & Instituto de Telecomunicacoes	81.8	86.9	74.8

Logical Form Representation

First-order logic is used as a semantic representation language (MRL), which is designed by the creator of the application to suit the application’s need independent of natural language. Some MRL data sets are based on Prolog database, where two well-known semantic parsing benchmark data sets are Jobs and Geo ¹⁰.

Here we explain the features of the Geoquery representation language through a sample query:

Input: "What is the largest city in Texas?"
Query: answer(C, largest(C, (city(C), loc(C, S), const(S, stateid(texas)))))

Objects are represented as logical terms and are typed with a semantic category using logical functions applied to possibly ambiguous English constants (e.g. stateid(Mississippi), riverid(Mississippi)). Relationships between objects are expressed using predicates; for instance, loc(X, Y) states that X is located in Y.

We also need to handle quantifiers such as ‘largest’. We represent these using meta-predicates for which at least one argument is a conjunction of literals. For example, largest(X, Goal) states that the object X satisfies Goal and is the largest object that does so, using the appropriate measure of size for objects of its type (e.g. area for states, population for cities). Finally, an unspecified object required as an argument to a predicate can appear elsewhere in the sentence, requiring the use of the predicate const(X, C) to bind the variable X to the constant C.

Data

Geo 880: A set of 880 queries to a database of U.S. geography containing basic information about the U.S. states like population, area, capital city, neighboring states, and so on. It is originally annotated with Prolog style semantics. Geo has 880 instances split into a training set of 680 training examples and 200 test examples ¹¹.
Jobs 640: A set of 640 queries to a database of computer-related job postings, such as job announcements, from the USENET newsgroup austin.jobs. Information from these job postings are extracted to create a database which contains the following types of information: 1) the job title, 2) the company, 3) the recruiter, 4) the location, 5) the salary, 6) the languages and platforms used, and 7) required or desired years of experience and degrees. The training-test split contains 500 training and 140 test instances.
ATIS: A set of 5,410 queries to a flight booking system, and the representations are of lambda-calculus logical forms. The ATIS database consists of data obtained from the Official Airline Guide (OAG, 1990), organized under a relational schema. The database remained fixed throughout the pilot phase. It contains information about flights, fares, airlines, cities, airports, and ground services, and includes twenty-five supporting tables. The large majority of the questions posed by subjects can be answered from the database with a single relational query.

Experiment & Results

The results are based on accuracies which are defined as the proportion of the input sentences that are correctly parsed to their gold standard logical forms.

paper	model	Geo880	Jobs640	ATIS
EMNLP-18	Xu et al. Tencent AI Lab & IBM Research & Peking University	88.1	91.2	85.9
ACL-17	Rabinovich et al. UC Berkeley	85.7	92.9	85.3
ACL-11	Liang et al. UC Berkeley	87.9	90.7	-
ACL-14	Wang et al. University of Washington & Allen Institute for AI	90.4	-	91.3
EMNLP-13	Kwiatkowski et al. University of Washington	89.0	-	-
NAACL-15	Zhao and Huang City University of New York	88.9	85.0	84.2
ACL-13	Poon Microsoft Research	-	-	83.5
EMNLP-11	Kwiatkowski et al. University of Edinburgh & University of Washington	88.6	-	82.8
EMNLP-10	Kwiatkowski et al. University of Edinburgh & University of Washington	87.9	-	71.4
ACL-16	Dong & Lapata -Seq2Tree University of Edinburgh	87.1	90.0	85.3
ACL-07	Wong and Mooney University of Texas	86.6	-	-
EMNLP-CoNLL-07	Zettlemoyer and Collins MIT	86.1	-	84.6
ACL-16	Dong & Lapata -Seq2Seq University of Edinburgh	85.0	87.1	84.2
ACL-16	Jia & Liang Stanford University	85.0	-	76.3
EMNLP-08	Lu et al. Singapore-MIT alliance & National University of Singapore & MIT	81.8	-	-
ECML-01	Tang & Mooney University of Texas	79.8	79.4	-
UAI-05	Zettlemoyer and Collins MIT	79.3	79.3	-
IUI-03	Poposcu et al. University of Washington	77.5	88.0	-
NAACL-06	Wong and Mooney University of Texas	74.8	-	-
CoNLL-05	Ge and Mooney University of Texas	72.3	-	-
COLING-ACL-06	Kate and Mooney University of Texas	71.7	-	-

SQL Query

In recent years, there has been considerable interest in automated synthesis of programs from informal specifications. A particularly promising application domain for such computer-aided programming techniques is the automated synthesis of database queries. Although many end-users need to query data stored in some relational database, they typically lack the expertise to write complex queries in declarative query languages such as SQL. As a result, there has been a flurry of interest in automatically synthesizing SQL queries from informal specifications. Here we will talk about the techniques that can generate SQL queries from natural language (NL) descriptions.

For example, for the sentence “how many CFL teams are from York College”, we will have the SRL query like:

SELECT COUNT CFL Team FROM
CFLDraft WHERE College = "York"

Data

WikiSQL: WikiSQL is a collection of questions, corresponding SQL queries, and SQL tables. It is the largest hand-annotated semantic parsing dataset to data - it is an order of magnitude larger than other datasets that have logical forms, either in terms of the number of examples or the number of tables. The corpus consists of 80654 hand-annotated instances of natural language questions, SQL queries, and SQL tables extracted from 24241 HTML tables from Wikipedia ¹².

Experiment and Results

To learn more about the experiment and results on WikiSQL, see https://github.com/salesforce/WikiSQL for more details.

Rohit J Kate and YukWahWong. 2010. Semantic parsing: The task, the state of the art and the future. Tutorial Abstracts of ACL 2010 page 6. ↩
Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation for SemBanking. In Proc. of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. ↩
Oepen, S., Kuhlmann, M., Miyao, Y., Zeman, D., Flickinger, D., Hajic, J., Ivanova, Angelina, Zhang, Y. 2014. SemEval 2014 Task 8: Broad-coverage semantic dependency parsing. In Proceedings of the 8th International Workshop on Semantic Evaluation. Dublin, Ireland. ↩
Gildea, D., & Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics, 28, 71:245–288. ↩
Flickinger, D., Zhang, Y., & Kordoni, V. 2012. DeepBank. A dynamically annotated treebank of the Wall Street Journal. In Proceedings of the 11th International Workshop on Treebanks and Linguistic Theories (p. 85–96). Lisbon, Portugal: Edições Colibri. ↩
Oepen, S., & Lønning, J. T. 2006. Discriminantbased MRS banking. In Proceedings of the 5th International Conference on Language Resources and Evaluation (p. 1250–1255). Genoa, Italy. ↩
Ivanova, A., Oepen, S., Øvrelid, L., & Flickinger, D. 2012. Who did what to whom? A contrastive study of syntacto-semantic dependencies. In Proceedings of the Sixth Linguistic Annotation Workshop (p. 2–11). Jeju, Republic of Korea. ↩
Hajic, J., Hajicová, E., Panevová, J., Sgall, P., Bojar, O., Cinková, S., … & Semecký, J. 2012. Announcing Prague Czech-English Dependency Treebank 2.0. In LREC (pp. 3153-3160). ↩
Miyao, Y., Oepen, S., & Zeman, D. 2014. In-house: An ensemble of pre-existing off-the-shelf parsers. In Proceedings of the 8th International Workshop on Semantic Evaluation. Dublin, Ireland. ↩
Lappoon R. Tang and Raymond J. Mooney. 2001. Using multiple clause constructors in inductive logic programming for semantic parsing. In Proceedings of the 12th ECML, pages 466–477, Freiburg, Germany. ↩
Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proceedings of the 21st UAI, pages 658–666, Toronto, ON. ↩
Zhong, V., Xiong, C. and Socher, R., 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103. ↩

Zi Lin

Semantic Parsing

Tags

Introduction

Abstract Meaning Representation

Data

Experiment & Results

Semantic Dependency Parsing

DM: DELPH-IN MRS-Derived Bi-Lexical Dependencies

PAS: Enju Predicate-Argument Structures

PCEDT: Prague Tectogrammatical Bi-Lexical Dependencies

Data

Experiment & Results

In-Domain

Out-Of-Domain

Logical Form Representation

Data

Experiment & Results

SQL Query

Data

Experiment and Results