Zi Lin
by Zi Lin


  • NLP Tasks


Semantic Role Labeling (SRL), sometimes called shallow semantic parsing, is a process in natural language processing that assigns semantic roles to constituents or their head words in a sentence according to their relationship to the predicates expressed in the sentence. Typical semantic roles can be divided into core arguments and adjuncts. The core arguments include Agent, Patient, Source, Goal, etc, While the adjuncts include Location, Time, Manner, Cause, etc.

For example, in the sentence I ate breakfast quickly in the car this morning because I was in a hurry, we can have the labeled sentence like this:

[I]agent [eat]predicate [breakfast]patient [quickly]manner [in the car]location [this morning]time [because I was in a hurry]cause .


There are mainly three sources for SRL, namely, PropBank, FrameNet and VerbNet.


PropBank is a corpus in which the arguments of each predicate are annotated with their semantic roles in relation to the predicate 1. Currently, all the PropBank annotations are done on top of the phrase structure annotation of the Pen TreeBank 2. In addition to semantic role annotation, PropBank annotation requires the choice of a sense ID (also known as frameset or roleset ID) for each predicate. Thus, for each verb in every tree (representing the phrase structure of the corresponding sentence), PropBank has the instance that consists of the sense ID of the predicate (e.g. eat.01) and its arguments labeled with semantic roles.

An important goal is to provide consistent argument labels across different syntactic realizations of the same verb, as in…

[I]ARG0 [eat]predicate [breakfast]ARG1 [quickly]ARGM-MNR [in the car]ARGM-LOC [this morning]ARGM-TMP [because I was in a hurry]ARGM-CAU .

The argument structure of each predicate is outlined in the PropBank frame file for that predicate. The frame file denotes the correspondences between numbered arguments and semantic roles, as this is somewhat unique for each predicate.

Numbered arguments reflect either the arguments that are required for the valency of a predicate (e.g., agent, patient, benefactive), or if not required, those that occur with high-frequency in actual usage. Although numbered arguments correspond to slightly different semantic roles given the usage of each predicate, in general numbered arguments correspond to the following semantic roles:

ARG0 agent ARG3 starting point, benefactive, attribute
ARG1 patient ARG4 ending point
ARG2 instrument, benefactive, attribute ARGM modifier


The FrameNet project is to design a lexical database of English based on a linguistic theory called Frame Semantics. The meaning of most words can best be understood on the basis of a semantic frame, and the basic assumption on which the frames are built is that each word evokes a particular situation with particular participants3.

For example, the concept of eat usually involves a person doing the eating (Ingester), the food that is to be eaten (Ingestibles). In FrameNet, this event is represented as a frame called Ingestion, while the Ingester and Ingestibles are all frame elements (FEs), and predicates eat evoking frames are called lexical unites (LUs). One frame can have multiple lexical unites, e.g., for the frame Ingestion, we can have consume, devour, drink, eat, feed, etc.

[I]Ingester [eat]LUs [breakfast]Ingestibles [quickly]Manner [in the car]Place [this morning]Time because I was in a hurry.


[I]Agent [eat]Verb [breakfast]Patient quickly in the car this morning because I was in a hurry.


  • CoNLL-2005: The CoNLL-2005 dataset takes section 2-21 of the Wall Street Journal (WSJ) corpus as training set, and section 24 as development set. The test set consist of section 23 of the WSJ corpus (in-domain test) as well as 3 section from the Brown corpus (out-of-domain test) 4.

  • CoNLL-2012: The CoNLL-2012 dataset is extracted from the OntoNotes v5.0 corpus 5. The description and separation of training, development and test set can be found in Pardhan et al. (2013).

Experiments & Results

Note: All the statistics are obtained from the exact papers or papers citing the exact papers. Contact me if the statistics are not correct or you think your model should be listed as one of them.

With Gold Predicate

Of course, for each model, we can have different versions, e.g., using the word embeddings from Glove or ELMo, and different versions can lead to different results. Here I just list the results for one of the versions, most of which are single models without using any tricks. I will specify if there are tricks leveraged, e.g., LISA+D&M. You should look the original paper if you are interested in the model, and check whether there are some other versions of the model yielding better results. Please contact me (through email) if the statistics is incorrect.

Paper Model WSJ Brown CoNLL-2012
University of Massachusetts Amherst & Google AI Language
86.04 76.54 -
NAACL-18 Peter et al.+ELMo
Allen Institute for Artificial Intelligence & University of Washington
- - 84.6
AAAI-18 Tan et al.
Xiamen University & Tencent Technology Co.
84.8 74.1 82.7
ACL-18 He et al.
University of Washington
83.9 73.7 82.1
ACL-17 He et al.
University of Washington, Facebook AI Research & Allen Institute for Artificial Intelligence
83.1 72.1 81.7
EMNLP-17 Yang and Mitchell
Carnegie Mellon University
81.9 72.0 -
ACL-15 Zhou and Xu
Baidu Research
82.8 69.4 81.1
EMNLP-15 FitzGerald at al.
University of Washington & Google
80.3 72.2 80.1
TACL-15 Tackstrom et al.
79.9 71.3 79.4
CoNLL-13 Pradhan et al.
Boston Children’s Hospital and Harvard Medical School, University of Trento, QCRI, Brandeis University, National University of Singapore & University of Stuttgart
- - 77.5
CL-08 Toutanova et al.
Microsoft Research, University of California Berkeley & Stanford University
80.3 68.8 -
CL-08 Punyakanok et al.
BBN Technologies, University of Illinois at Urbana-Champaign & Microsoft Research
79.4 67.8 -

Without Gold Predicate

Paper Model WSJ Brown CoNLL-2012
University of Massachusetts Amherst & Google AI Language
86.90 78.25 83.38
ACL-18 He et al.+ELMo
University of Washington
86.0 76.1 82.9
ACL-17 He et al.+GloVe+PoE
University of Washington, Facebook AI Research & Allen Institute for Artificial Intelligence
82.7 70.1 78.4
  1. Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1):71-106. 

  2. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: the penn treebank. Computational Linguistics, 19(2):313-330. 

  3. Charles J Fillmore. 1968. The Case for Case. Universals in Linguistic Theory. 

  4. Xavier Carreras and Lluis Marquez. 2005. Introduction to the conll-2005 shared task: Semantic role labeling. In CoNLL. 

  5. Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Bjorkelund, Olga Uryupina, Yuchen Zhang, and Zhi Zhong. 2013. Towards robust linguistic analysis using ontonotes. In CoNLL.