Please note that we have revised our earlier plan that foresaw parallel sessions; the sessions will be in sequence now according to multiple requests. There will be four sessions.
Srinivas Bangalore, AT&T Research, Florham Park, NJ: Corpora, Evaluation and Generation
The availability of a parse-annotated treebank (e.g. Penn Treebank) and an parse evaluation metric (e.g. Parseval) has led to ever increasing models for stochastic parsing. The availability of corpora has also spurred a methodology for developing large-scale grammars capable of parsing real-world texts (e.g. XTAG, HPSG).
It appears that some aspects of generation can benefit from the availability of an annotated corpus and an evaluation metric. In particular, we will focus on their contribution to sentence planning and surface realization components of a generation system. Some of the questions, we would like to raise for discussion include:
Eduard Hovy, USC/ISI, Los Angeles: Burning Issue for NLG: The Opportunities and Limits of Statistics-Based Generation
Since the early 1970s, many aspects of NLP (speech recognition, IR, word segmentation, part of speech tagging, and recently parsing and MT) have been addressed, some very successfully, by statistical methods. Often, these systems overcame exactly the problems that plague NLG systems: brittleness, domain-dependency, labour-intensive rule construction, and the inability to formulate clear criteria of choice in symbolic terms.
Over the past 4 years, an entirely new type of language generation system has made its appearance: the generator based on statistical knowledge. Are we witnessing the birth of a new paradigm in NLG? Will statistical systems allow NLG to evolve from an essentially research-only area to an area with true application-level technology?
This session is devoted to understanding better the opportunities and limits of statistics-based NLG. It will focus on three major points:
1. What does `statistical NLG' mean, exactly?
... three case studies, in brief
2. What is `statistical' knowledge? Can all the knowledge required for
NLG be `statistical'? If not, why not?
... general characteristics of `statistical' knowledge in NLP systems, and the nature of the four kinds of knowledge required for NLG
3. How can one expect research on statistical NLG to proceed, in general
terms? What will statistics not be able to do (ever)?
... a hierarchy of increasing sophistication of statistical models. The kinds of things statistical models do not do
Daniel Marcu, USC/ISI, Los Angeles: Summarization and Generation
During the last five years, dozens of "summarization" systems have been produced by University and Research Labs, News Providers, and Internet-based DotComs. The vast majority of these "summarizers" are extraction systems: they identify clauses and sentences that are important in the input texts; and they catenate them to often produce incoherent outputs that contain dangling references and abrupt topic shifts.
Traditionally, the NLG community has focused on mapping abstract representations into well-written texts. But recently established markets desperately need NLG technologies capable of producing coherent texts out of text fragments extracted from single and multiple documents, which may be written at different levels of competency in multiple languages and styles. Over the next five years, will these markets induce the NLG community to shift its research focus? Will the community end up concentrating primarily on generating well-written texts out of text fragments and/or badly-written texts? What algorithms and techniques are needed to solve this type of generation problem?
This session is devoted to discussing open problems and opportunities that lie at the boundary between text summarization and natural language generation.
Chris Mellish, University of Edinburgh: What are reusable modules for NLG?
Questions to be addressed would include:
Last changed: 31 May 2000, Stephan Busemann firstname.lastname@example.org