Form for querying parsed text with CorpusSearch

© 2011 Anthony Kroch and Beth Randall

In this demo, each corpus of the PPCHE is represented by ten texts.

Get query from text
  • A query of a parsed database specifies the properties that a parse tree in the database must exhibit in order to count as a hit. This specification can be thought of as a subtree template that the parse tree must instantiate.

  • The Query Tree on the right side of this page is such a subtree template. It appears initially as a minimal tree with a root and one branch but is expandable as the need arises.

    • The ?DOM pop-up menu on the tree branch allows the user to specify whether it represents domination (Dom) or immediate domination (iDom). Into the upper text box the user inserts the label of the subtree root and into the lower text box s/he inserts the dominated constituent, which may be a node label or a leaf. The node label will be a phrasal or a part-of-speech tag. The leaf will be a word or grammatical formative.
    • Clicking on a button adds or subtracts a branch below the button. Branches contain new domination pop-ups and new text boxes for labels and leaves. Branches are added below the buttons and to the right. If a node dominates multiple branches, they are removed from the right.
    • Between adjacent branches at a given level of a query tree there is a ?PRE pop-up menu that allows the user to specify the linear order of the adjacent branches. The symbol  >  specifies precedence and  >>  specifies immediate precedence.

  • The Search Domain box should contain a node label that specifies the syntactic domain with which a search is to be carried out. For example, one might search for PP's within sentences or within NPs.

  • Once the Query Tree and Search Domain have been specified and a directory of parsed files to be searched has been chosen, the query is executed by clicking on the Submit Query button. The result is returned on a new web page.

  • The submit button is automatically disabled after submission until some change is made to a text entry box on the form. To resubmit the same query, click on any text box and re-enter its contents.

  • The Annotation Manual for the Penn Historical Corpora is available on the web. Lists of part-of-speech tags and syntactic labels are included.

  • The search program, CorpusSearch, that runs the queries entered on this web form, is more powerful than is apparent from the functionality of this web form. Documentation for CorpusSearch is available at Sourceforge, where a standalone version run from the command line is freely available.
Search Domain: 
Query Tree: