PC-PATR Function Library Reference Manual

functions for unification based parsing

version 1.2

January 2000

by Stephen McConnel


Table of Contents


1. Introduction to the PC-PATR function library

PC-PATR is an implementation for personal computers of the PATR-II computational linguistic formalism. The PATR-II formalism can be viewed as a computer language for encoding linguistic information. It does not presuppose any particular theory of syntax. It was originally developed by Stuart M. Shieber at Stanford University in the early 1980's. A PATR-II grammar consists of a set of rules and a lexicon. Each rule consists of a context-free phrase structure rule and a set of feature constraints, that is, unifications on the feature structures associated with the constituents of the phrase structure rules. The lexicon provides the items that can replace the terminal symbols of the phrase structure rules, that is, the words of the language together with their relevant features.

This function library contains the processing functions used by PC-PATR and related programs. It has been developed with the goal of making it easier to cast PATR-II style parsing into different frameworks. The first use of this library has been to add a morphotactic component to PC-Kimmo consisting of a PC-PATR word parser.

PC-PATR (and thus this function library) is still under development. The author would appreciate feedback directed to the following address:

Stephen McConnel                 (972)708-7361 (office)
Language Software Development    (972)708-7561 (fax)
SIL International
7500 W. Camp Wisdom Road
Dallas, TX 75236                 steve@acadcomp.sil.org
U.S.A.                        or Stephen_McConnel@Sil.org

2. Variable and function naming conventions

The basic goal behind choosing names in the PC-PATR function library is for the name to convey information about what it represents. This is achieved in two ways: striving for a descriptive name rather than a short cryptic abbreviated name, and following a different pattern of capitalization for each type of name.

2.1 Preprocessor macro names

Preprocessor macro names are written entirely in capital letters. If the name requires more than one word for an adequate description, the words are joined together with intervening underscore (_) characters.

2.2 Data structure names

Data structure names consist of one or more capitalized words. If the name requires more than one word for an adequate description, the words are joined together without underscores, depending on the capitalization pattern to make them readable as separate words.

2.3 Variable names

Variable names in the PC-PATR function library follow a modified form of the Hungarian naming convention described by Steve McConnell in his book Code Complete on pages 202-206.

Variable names have three parts: a lowercase type prefix, a descriptive name, and a scope suffix.

2.3.1 Type prefix

The type prefix has the following basic possibilities:

b
a Boolean, usually encoded as a char, short, or int
c
a character, usually a char but sometimes a short or int
d
a double precision floating point number, that is, a double
e
an enumeration, encoded as an enum or as a char, short, or int
i
an integer, that is, an int, short, long, or (rarely) char
s
a data structure defined by a struct statement
sz
a NUL (that is, zero) terminated character string
pf
a pointer to a function

In addition, the basic types may be prefixed by these qualifiers:

u
indicates that an integer or a character is unsigned
a
indicates an array of the basic type
p
indicates a pointer to the type, possibly a pointer to an array or to a pointer

2.3.2 Descriptive name

The descriptive name portion of a variable name consists of one or more capitalized words concatenated together. There are no underscores (_) separating these words from each other, or from the type prefix. For the PC-PATR function library, the descriptive name for global variables begins with PATR.

2.3.3 Scope suffix

The scope suffix has these possibilities:

_g
indicates a global variable accessible throughout the program
_m
indicates a module (semiglobal) variable accessible throughout the file (declared static)
_in
indicates a function argument used for input
_out
indicates a function argument used for output (must be a pointer)
_io
indicates a function argument used for both input and output (must be a pointer)
_s
indicates a function variable that retains its value between calls (declared static)

The lack of a scope suffix indicates that a variable is declared within a function and exists on the stack for the duration of the current call.

2.4 Function names

Global function names in the PC-PATR function library have two parts: a verb that is all lowercase followed by a noun phrase containing one or more capitalized words. These pieces are concatanated without any intervening underscores (_). For the PC-PATR library functions, the noun phrase section includes PATR.

2.5 Examples

Given the discussion above, it is easy to discern at a glance what type of item each of the following names refers to.

SAMPLE_NAME
is a preprocessor macro.
SampleName
is a data structure.
pSampleName
is a local pointer variable.
writeSampleName
is a function (that may apply to a data structure named SampleName).

3. PC-PATR data structures

The PC-PATR functions operate on a number of different data structures. The most important of these are described in the following sections. The PC-PATR functions also use a number of other data structures internally, but it should not be necessary for a programmer to manipulate them directly.

3.1 PATRData

3.1.1 Definition

/*
 *  forward declarations of internal PATR data types
 */
typedef struct patr_grammar     PATRGrammar;
typedef struct patr_lexicon     PATRLexicon;

typedef struct {
    char                bFailure;
    char                bUnification;
    char                eTreeDisplay;
    char                bGloss;
    char                bGlossesExist;
    char                iFeatureDisplay;
    char                bCheckCycles;
    char                bTopDownFilter;
    short               iMaxAmbiguities;
    short               iDebugLevel;
    char                cComment;
    char                bSilent;
    char                bShowWarnings;
    char                bPromoteDefAtoms;
    time_t              iMaxProcTime;
    FILE *              pLogFP;
    char *              pszGrammarFile;
    PATRGrammar *       pGrammar;
    char *              pszRecordMarker;
    char *              pszWordMarker;
    char *              pszGlossMarker;
    char *              pszCategoryMarker;
    char *              pszFeatureMarker;
    PATRLexicon *       pLexicon;
    int                 iCurrentIndex;
    int                 iParseCount;
    } PATRData;

3.1.2 Description

The PATRData data structure collects the information used for data processing within the PC-PATR functions. Its general purpose is to reduce the number of parameters needed by the various functions.

bFailure
causes parser failures to be preserved and displayed if TRUE (nonzero).
bUnification
enables unification while parsing if TRUE (nonzero). If FALSE, the parser acts only as a context free chart parser, which usually produces much more ambiguous output.
eTreeDisplay
is the tree display mode, one of these symbolic constant values:
PATR_NO_TREE
prevents any output of parse trees.
PATR_FLAT_TREE
displays parse trees as parenthesized, nested lists. For example,
(S (NP (N  cows))(VP (VerbalP (V  eat))(NP (N  grass))))
PATR_FULL_TREE
displays parse trees with a text representation of the tree structure. For example,
        S
   _____|_____
  NP        VP
   |      ___|____
   N   VerbalP  NP
 cows     |      |
          V      N
         eat   grass
PATR_INDENTED_TREE
displays parse trees in an indented (outline) fashion. For example,
S
    NP
        N  cows
    VP
        VerbalP
            V  eat
        NP
            N  grass
bGloss
causes glosses (if they exist) to be displayed if TRUE.
bGlossesExist
is set automatically according to whether or not glosses exist when the lexicon is loaded.
iFeatureDisplay
is a bit vector that encodes the feature display mode:
iFeatureDisplay & PATR_FEATURE_ON
allows the output of feature structures. If this bit is cleared (zero), feature structures are not written to the output, but may still be used in parsing.
iFeatureDisplay & PATR_FEATURE_FLAT
causes feature output to be "flattened" into a compact form which is less readable for humans, but just as easily parsed by a computer program. If this bit is cleared, a top level feature structure looks like this in the output:
S:
[ cat:   S
  head:    [ agr:   $1[ 3sg:   - ]
             finite:+
             pos:   V
             vform: BASE ]
  subj:    [ cat:   NP
             head:    [ agr:   $1
                        case:  NOM
                        number:PL
                        pos:   N
                        proper:-
                        verbal:- ] ] ]
On the other hand, if this bit is set, the same feature structure would be written like this:
S:      [ cat:S head:[ agr:$1[ 3sg:- ] finite:+ pos:V vform:BASE ]
          subj:[ cat:NP head:[ agr:$1 case:NOM number:PL pos:N
          proper:- verbal:- ] ] ]
iFeatureDisplay & PATR_FEATURE_ALL
causes all of the feature structures to be written to the output, not just the top level feature structure. If this bit is cleared, the feature structure output associated with a parse might look like one of the previous two examples. If this bit is set, the output might look like the following instead:
S_1:    [ cat:S head:[ agr:$1[ 3sg:- ] finite:+ pos:V vform:BASE ]
          subj:[ cat:NP head:[ agr:$1 case:NOM number:PL pos:N
          proper:- verbal:- ] ] ]

NP_2:   [ cat:NP head:[ agr:[ 3sg:- ] number:PL pos:N proper:-
          verbal:- ] ]

N_3:    [ cat:N gloss:`cow head:[ agr:[ 3sg:- ] number:PL pos:N
          proper:- verbal:- ] lex:cows root_pos:N ]

VP_4:   [ cat:VP head:[ finite:+ pos:V vform:BASE ] ]

VerbalP_5:      [ cat:VerbalP head:[ finite:+ pos:V vform:BASE ] ]

V_6:    [ cat:V gloss:`eat head:[ pos:V vform:BASE ] lex:eat
          root_pos:V ]

NP_7:   [ cat:NP head:[ agr:[ 3sg:+ ] number:SG pos:N proper:-
          verbal:- ] ]

N_8:    [ cat:N gloss:`grass head:[ agr:[ 3sg:+ ] number:SG pos:N
          proper:- verbal:- ] lex:grass root_pos:N ]
iFeatureDisplay & PATR_FEATURE_TRIM
prevents empty feature structures from being written to the output. If this bit is cleared, then a feature structure might look like this:
VP_3:
[ cat:   VP
  head:    [ form:  finite
             trans:   [ pred:  sleep
                        arg1:  $1[]
                        arg2:  [] ] ]
  syncat:  [ first:   [ cat:   NP
                        head:    [ agreement:  [ person:third
                                              number:singular ]
                                   trans: $1 ] ]
             rest:  end ] ]
If this bit is set, the same data structure would look like this:
VP_3:
[ cat:   VP
  head:    [ form:  finite
             trans:   [ pred:  sleep ] ]
  syncat:  [ first:   [ cat:   NP
                        head:    [ agreement:  [ person:third
                                              number:singular ] ] ]
             rest:  end ] ]
bCheckCycles
determines whether to enable checking for parse cycles while parsing.
bTopDownFilter
determines whether to enable top down filtering while parsing.
iMaxAmbiguities
is the maximum number of alternative parse trees to show in the output.
iDebugLevel
is the degree of debugging output desired (0 means none).
cComment
is the character that begins a comment in an input line. (PATR_DEFAULT_COMMENT is a symbol for the default value.)
bSilent
determines whether to disable messages to the "standard error" stream (stderr).
bShowWarnings
determines whether to enable warnings as well as error messages.
bPromoteDefAtoms
determines whether default atomic values in features loaded from the lexicon are "promoted" to ordinary atomic values before parsing. (This can affect feature unification since a conflicting default value does not cause a failure: the default value merely disappears.)
iMaxProcTime
determines the maximum number of seconds a parse is allowed to take. A value of 0 means no limit.
pLogFP
is the FILE pointer for an output log file (NULL means none).
pszGrammarFile
points to the name of the current PC-PATR grammar file (NULL means none).
pGrammar
points to the current PC-PATR grammar data (NULL means none).
pszRecordMarker
points to the standard format marker for lexicon records.
pszWordMarker
points to the standard format marker for lexicon word fields.
pszGlossMarker
points to the standard format marker for lexicon gloss fields.
pszCategoryMarker
points to the standard format marker for lexicon category fields.
pszFeatureMarker
points to the standard format marker for lexicon feature fields.
pLexicon
points to the current PC-PATR lexicon (NULL means none).
iCurrentIndex
is used for internal processing. It records the index number of the current edge.
iParseCount
is used for internal processing. It records the number of parses found.

3.1.3 Source File

`patr.h'

3.2 PATREdgeList

3.2.1 Definition

/*
 *  forward declaration of an internal PATR data type
 */
typedef struct patr_edge PATREdge;

typedef struct patr_edge_list {
    PATREdge *                  pEdge;
    struct patr_edge_list *     pNext;
    } PATREdgeList;

3.2.2 Description

The PATREdgeList data structure encodes a list of parse results returned by the PC-PATR parsing functions.

pEdge
points to a parse tree encoded as an edge in the parse chart.
pNext
points to the next parse tree encoded as an edge in the parse chart.

3.2.3 Source File

`patr.h'

3.3 PATRFeatureTags

3.3.1 Definition

#include "strlist.h"

typedef struct patr_feat_tags {
    StringList *                pFeaturePath;
    char *                      pszStartTag;
    char *                      pszEndTag;
    struct patr_feat_tags *     pNext;
    } PATRFeatureTags;

3.3.2 Description

The PATRFeatureTags data structure contains information needed to write feature structures to an output file in a stylized fashion.

pFeaturePath
points to a feature path encoded as a list of feature label strings.
pszStartTag
points to the text string written to the output file before the given feature value.
pszEndTag
points to the text string written to the output file after the given feature value.
pNext
points to another PATRFeatureTags data structure. This facilitates building a list of such items.

3.3.3 Source File

`patr.h'

3.4 PATRLabeledFeature

3.4.1 Definition

/*
 *  forward declaration of an internal PATR data type
 */
typedef struct patr_feature PATRFeature;

typedef struct patr_labeled_feat {
    char *                      pszLabel;
    PATRFeature *               pFeature;
    struct patr_labeled_feat *  pNext;
    } PATRLabeledFeature;

3.4.2 Description

The PATRLabeledFeature data structure contains information needed to abbreviate a feature structure to a simple label (template name) while writing an output file.

pszLabel
points to the label (template name) associated with the pFeature value
pFeature
points to a feature structure
pNext
points to another PATRLabeledFeature data structure. This facilitates building a list of such items.

3.4.3 Source File

`patr.h'

3.5 PATRWord

3.5.1 Definition

/*
 *  forward declaration of an internal PATR data type
 */
typedef struct patr_categ PATRWordCategory;

typedef struct patr_word {
    int                 iWordNumber;
    char *              pszWordName;
    PATRWordCategory *  pCategories;
    struct patr_word *  pNext;
    } PATRWord;

3.5.2 Description

The PATRWord data structure represents a single word of the sentence fed to the PC-PATR parsing function. A sentence is represented by a linked list of these data structures.

iWordNumber
is the number of the word in the sentence.
pszWordName
is the orthographic wordform.
pCategories
points to a list of word categories for this word. (This allows words to be syntactically ambiguous.) Each word category contains the feature structure associated with one sense of the word.
pNext
points to the next word in the sentence. NULL marks the end of the sentence.

3.5.3 Source File

`patr.h'

4. The PC-PATR function library global variables

This chapter gives the proper usage information about each of the global variables found in the PC-PATR function library. The `patr.h' header file contains the extern declarations for all of these variables.

4.1 bCancelPATROperation_g

4.1.1 Syntax

#include "patr.h"

extern int      bCancelPATROperation_g;

4.1.2 Description

bCancelPATROperation_g can be set asynchronously to interrupt a PC-PATR parse that seems to be stuck.

4.1.3 Example

#include <signal.h>
#include "patr.h"
...
void sigint_handler(int iSignal_in)
{
bCancelPATROperation_g = TRUE;
signal(SIGINT, sigint_handler);
}
...
signal(SIGINT, sigint_handler);
...

4.1.4 Source File

`patrdata.c'

4.2 cPATRPatchSep_g

4.2.1 Syntax

#include "patr.h"

extern const char       cPATRPatchSep_g;

4.2.2 Description

cPATRPatchSep_g is used to separate the revision and patch level values when printing the PC-PATR version number. 'a' indicates an alpha release, 'b' indicates a beta release, and '.' indicates a production release.

4.2.3 Example See section 4.5 iPATRVersion_g.

4.2.4 Source File `patrdata.c'

4.3 iPATRPatchlevel_g

4.3.1 Syntax

#include "patr.h"

extern const int        iPATRPatchlevel_g;

4.3.2 Description

iPATRPatchlevel_g is the current patch level of the PC-PATR function library and program. This is the third level version number, reflecting bug fixes or internal improvements that should be functionally invisible to users.

4.3.3 Example See section 4.5 iPATRVersion_g.

4.3.4 Source File `patrdata.c'

4.4 iPATRRevision_g

4.4.1 Syntax

#include "patr.h"

extern const int        iPATRRevision_g;

4.4.2 Description

iPATRRevision_g is the current revision level of the PC-PATR function library and program. This is the second level version number, reflecting changes to program behavior that require changes to the PC-PATR Reference Manual.

4.4.3 Example See section 4.5 iPATRVersion_g.

4.4.4 Source File `patrdata.c'

4.5 iPATRVersion_g

4.5.1 Syntax

#include "patr.h"

extern const int        iPATRVersion_g;

4.5.2 Description

iPATRVersion_g is the current version number of the PC-PATR function library and program. This is the top level version number, reflecting a major rewrite of the program or major changes that make it incompatible with earlier versions of the program.

4.5.3 Example

#include <stdio.h>
#include "patr.h"
...
fprintf(stderr,
        "PC-PATR version %d.%d%c%d (%s), Copyright %s SIL\n",
        iPATRVersion_g, iPATRRevision_g, cPATRPatchSep_g,
        iPATRPatchlevel_g, pszPATRDate_g, pszPATRYear_g);
#ifdef __DATE__
fprintf(stderr, pszPATRCompileFormat_g,
        pszPATRCompileDate_g, pszPATRCompileTime_g);
#else
if (pszPATRTestVersion_g != NULL)
    fputs(pszPATRTestVersion_g, stderr);
#endif
...

4.5.4 Source File

`patrdata.c'

4.6 pszPATRCompileDate_g

4.6.1 Syntax

#include "patr.h"

#ifdef __DATE__
extern const char *     pszPATRCompileDate_g;
#endif

4.6.2 Description

pszPATRCompileDate_g points to a string containing the date on which the PC-PATR library was compiled. It exists only if the C compiler preprocessor supports the __DATE__ constant.

4.6.3 Example See section 4.5 iPATRVersion_g.

4.6.4 Source File `patrdata.c'

4.7 pszPATRCompileFormat_g

4.7.1 Syntax

#include "patr.h"

#ifdef __DATE__
#ifdef __TIME__
extern const char *     pszPATRCompileFormat_g;
#endif
#endif

4.7.2 Description

pszPATRCompileFormat_g points to a printf style format string suitable for displaying pszPATRCompileDate_g and pszPATRCompileTime_g. It exists only if the C compiler preprocessor supports the __DATE__ and __TIME__ constants.

4.7.3 Example See section 4.5 iPATRVersion_g.

4.7.4 Source File `patrdata.c'

4.8 pszPATRCompileTime_g

4.8.1 Syntax

#include "patr.h"

#ifdef __TIME__
extern const char *     pszPATRCompileTime_g;
#endif

4.8.2 Description

pszPATRCompileTime_g points to a string containing the time at which the PC-PATR library was compiled. It exists only if the C compiler preprocessor supports the __TIME__ constant.

4.8.3 Example See section 4.5 iPATRVersion_g.

4.8.4 Source File `patrdata.c'

4.9 pszPATRDate_g

4.9.1 Syntax

#include "patr.h"

extern const char *     pszPATRDate_g;

4.9.2 Description

pszPATRDate_g points to a string containing the date on which the PC-PATR library was last modified.

4.9.3 Example See section 4.5 iPATRVersion_g.

4.9.4 Source File `patrdata.c'

4.10 pszPATRTestVersion_g

4.10.1 Syntax

#include "patr.h"

#ifndef __DATE__
extern const char *     pszPATRTestVersion_g;
#endif

4.10.2 Description

pszPATRTestVersion_g points to a string describing the test status of PC-PATR (either alpha or beta). If this is a production release version, it is set to NULL. It is defined only if the C compiler preprocessor does not support the __DATE__ constant.

4.10.3 Example See section 4.5 iPATRVersion_g.

4.10.4 Source File `patrdata.c'

4.11 pszPATRYear_g

4.11.1 Syntax

#include "patr.h"

extern const char *     pszPATRYear_g;

4.11.2 Description

pszPATRYear_g points to a string containing the year in which the PC-PATR library was last modified. This is suitable for a copyright notice assigning the copyright to SIL International.

4.11.3 Example See section 4.5 iPATRVersion_g.

4.11.4 Source File `patrdata.c'

5. PC-PATR functions

This document gives the proper usage information about each of the functions found in the PC-PATR function library. The prototypes and type definitions relevent to the use of these functions are all found in the `patr.h' header file.

5.1 addPATRLexItem

5.1.1 Syntax

#include "patr.h"

void addPATRLexItem(char *        pszWord_in,
                    char *        pszGloss_in,
                    char *        pszCategory_in,
                    char *        pszFeatures_in,
                    PATRFeature * pFeature_in,
                    PATRData *    pPATR_io);

5.1.2 Description

addPATRLexItem adds one entry to the PC-PATR lexicon stored in memory.

The arguments to addPATRLexItem are as follows:

pszWord_in
points to the orthographic string of the lexical item.
pszGloss_in
points to a gloss string for the lexical item.
pszCategory_in
points to the syntactic category for the lexical item.
pszFeatures_in
points to a space delimited list of feature (template) names associated with the lexical item.
pFeature_in
points to a feature structure associated with the lexical item. This is an alternative to pszFeatures_in.
pPATR_io
points to the data structure that contains the PC-PATR language data such as the lexicon.

5.1.3 Return Value

none

5.1.4 Example

#include <string.h>
#include "patr.h"
#include "opaclib.h"

void storeTemplate(WordTemplate * pTemplate_in, PATRData * pPATR_in)
{
WordAnalysis *  pAnal;
char *          pszFeatures;
char *          p;

for ( pAnal = pTemplate_in->pAnalyses ; pAnal ; pAnal = pAnal->pNext )
    {
    pszFeatures = NULL;
    if (pAnal->pszFeatures != NULL)
        {
        pszFeatures = duplicateString(pAnal->pszFeatures);
        while ((p = strchr(pszFeatures, '=')) != NULL)
            *p = ' ';
        }
    addPATRLexItem(pTemplate_in->pszSurfaceForm,
                   pAnal->pszAnalysis,
                   pAnal->pszCategory,
                   pszFeatures,
                   NULL, pPATR_in);
    if (pszFeatures != NULL)
        {
        freeMemory(pszFeatures);
        pszFeatures = NULL;
        }
    }
}

5.1.5 Source File

`patrlexi.c'

5.2 buildPATRWord

5.2.1 Syntax

#include "patr.h"

PATRWord * buildPATRWord(char *        pszLex_in,
                         char *        pszGloss_in,
                         char *        pszCat_in,
                         char *        pszFeatures_in,
                         PATRFeature * pPATRFeature_in,
                         PATRData *    pPATR_in);

5.2.2 Description

buildPATRWord converts the given information into the form needed for a PATR parse. This is used by the (X)AMPLE program in preparing a proposed word analysis for parsing with a word grammar.

The arguments to buildPATRWord are as follows:

pszLex_in
contains the lexical form of the morpheme. It must not be NULL.
pszGloss_in
contains a short gloss of the morpheme. It may be NULL.
pszCat_in
contains the category of the morpheme. It must not be NULL.
pszFeatures_in
contains zero or more feature template names separated by spaces or equal signs (=). It may be NULL.
pPATRFeature_in
is reserved for future use. It may be NULL.
pPATR_in
points to the data structure that contains the PC-PATR language data such as the grammar (which includes template definitions). It must not be NULL.

5.2.3 Return Value

a pointer to a dynamically allocation PATRWord data structure containing the morpheme information

5.2.4 Example

#include "ample.h"              /* #includes "patr.h" */
#include "ampledef.h"
...
PATREdgeList * perform_word_parse(pAnal_in, pAmple_in)
AmpleHeadList * pAnal_in;
AmpleData *     pAmple_in;
{
AmpleHeadList *  hp;
AmpleAllomorph * ap;
AmpleMorpheme *  mp;
PATRWord *       pWord = NULL;
PATRWord *       pNewMorph  = NULL;
char *           pszLex;
char *           pszGloss;
char *           pszPATRCategory;
char *           pszFromCat;
char *           pszToCat;
char *           pszProps;
char *           pszFeatures;
/*
 * Convert the list of morphemes to what parseWithPATR() wants.
 */
for ( hp = pAnal_in ; hp ; hp = hp->pLeft )
    {
    ap = hp->pAllomorph;
    if (ap == NULL)
        continue;               /* should never happen */
    mp = ap->pMorpheme;
    if (mp == NULL)
        continue;               /* should never happen */
    if (mp->pszUnderForm != NULL)
        pszLex = mp->pszUnderForm;
    else
        pszLex = ap->pszAllomorph;
    if ((pszLex == NULL) || (*pszLex == NUL))
        pszLex = "0";
    pszGloss = mp->pszMorphName;
    if (mp->pszPATRCat != NULL)
        {
        pszPATRCategory = mp->pszPATRCat;
        }
    else
        {
        switch (hp->eType)
            {
            case AMPLE_PFX:
                pszPATRCategory = "Prefix";
                break;
            case AMPLE_IFX:
                if (    (hp->pRight == NULL) ||
                        (hp->pRight->eType == AMPLE_SFX) )
                    pszPATRCategory = "Suffix";
                else
                    pszPATRCategory = "Prefix";
                break;
            case AMPLE_SFX:
                pszPATRCategory = "Suffix";
                break;
            default:
                pszPATRCategory = "Root";
                break;
            }
        }
    if (hp->eType == AMPLE_ROOT)
        pszFromCat = NULL;
    else
        pszFromCat = findAmpleCategoryName(get_from(hp),
                                       pAmple_in->pCategories);
    pszToCat    = findAmpleCategoryName(get_to(hp),
                                       pAmple_in->pCategories);
    pszProps    = build_prop_string(ap->sPropertySet,
                                    &pAmple_in->sProperties);
    pszFeatures = build_feature_string(mp->pszMorphFd,
                                     pszFromCat, pszToCat,
                                     pszProps ? pszProps : "");
    if (pszProps != NULL)
        freeMemory(pszProps);
    pNewMorph = buildPATRWord(pszLex, pszGloss,
                              pszPATRCategory, pszFeatures,
                              mp->pPATRFeature,
                              &pAmple_in->sPATR);
    freeMemory(pszFeatures);
    pNewMorph->pNext = pWord;
    pWord            = pNewMorph;
    }
...
}

5.2.5 Source File

`patalloc.c'

5.3 buildPATRWordForKimmo

5.3.1 Syntax

#include "patr.h"

PATRWord * buildPATRWordForKimmo(char *           pszLex_in,
                                 char *           pszGloss_in,
                                 char *           pszCat_in,
                                 unsigned short * puiFeatIndexes_in,
                                 char **          ppszFeatures_in,
                                 PATRData *       pPATR_in);

5.3.2 Description

buildPATRWordForKimmo converts the supplied information into the form needed to apply a PC-PATR analysis.

The arguments to buildPATRWordForKimmo are as follows:

pszLex_in
points to the lexical form.
pszGloss_in
points to a gloss string.
pszCat_in
points to the grammatical category.
puiFeatIndexes_in
points to an array of feature (template) name indexes for features associated with this item.
ppszFeatures_in
points to the array of feature (template) names indexed by the members of puiFeatIndexes_in.
pPATR_in
points to the data structure that contains the PC-PATR language data such as the grammar.

5.3.3 Return Value

a pointer to a dynamically allocated PATRWord data structure encoding the supplied information

5.3.4 Example See section 5.19 parseWithPATR.

5.3.5 Source File `patrkimm.c'

5.4 collectPATRParseGarbage

5.4.1 Syntax

#include "patr.h"

void collectPATRParseGarbage(PATRData * pPATR_io);

5.4.2 Description

collectPATRParseGarbage cleans up the memory used by parseWithPATR. If the parse results are wanted for an extended period of time, then storePATREdgeList must be called after parseWithPATR and before collectPATRParseGarbage

collectPATRParseGarbage has only one argument:

pPATR_io
points to the data structure that contains the PC-PATR language data and internal memory storage.

5.4.3 Return Value

none

5.4.4 Example See section 5.19 parseWithPATR.

5.4.5 Source File `patalloc.c'

5.5 convertKimmoPATRToWordAnalyses

5.5.1 Syntax

#include "patr.h"
#include "kimmo.h"

WordAnalysis * convertKimmoPATRToWordAnalyses(
                    KimmoResult *        pKimmoResult_in,
                    KimmoData *          pKimmo_in,
                    StringList *         pCategoryPath_in,
                    int                  cDecomp_in,
                    PATRLabeledFeature * pFdDefinitions_in,
                    WordAnalysis *       pAnalyses_io,
                    unsigned *           puiAmbigCount_io,
                    PATRData *           pPATR_io);

5.5.2 Description

convertKimmoPATRToWordAnalyses converts the result of a PC-Kimmo analysis into a form suitable for output via the writeTemplate function. This is part of the PATR function library rather than the Kimmo library because it requires fiddling with feature structures internal to the PATR library.

The arguments to convertKimmoPATRToWordAnalyses are as follows:

pKimmoResult_in
points to a list of analyses returned by applyKimmoRecognizer.
pKimmo_in
points to the Kimmo data used by applyKimmoRecognizer.
pCategoryPath_in
points to the feature path used to find the word category in the top level feature structure associated with each Kimmo analysis.
cDecomp_in
is the character used to separate the morphemes in a word decomposition string.
pFdDefinitions_in
points to the set of mappings from a feature structure to a set of feature names.
pAnalyses_io
points to the set of analyses that have already been converted.
puiAmbigCount_io
points to a counter that stores the number of distinct analyses in the output.

5.5.3 Return Value

a pointer to a list of word analysis data structures

5.5.4 Example

#include <stdio.h>
#include "patr.h"
#include "kimmo.h"
...
PATRLabeledFeature *    pFdDefinitions_g   = NULL;
...
static void analyzeFile(FILE * pInputFP_in,
                        FILE * pOutputFP_in,
                        char * pszOutputFile_in,
                        TextControl * pTextControl_in,
                        KimmoData *   pKimmo_in)
{
WordTemplate *          pWord;
WordAnalysis *          pAnal;
KimmoResult *           pResult;
unsigned                uiAmbiguityCount;
unsigned char *         pszWord;
size_t                  i;

while ((pWord = readTemplateFromText(pInputFP_in,
                                     pTextControl_in)) != NULL)
    {
    pWord->iOutputFlags = WANT_DECOMPOSITION | WANT_ORIGINAL;
    if (pWord->paWord != NULL)
        {
        uiAmbiguityCount = 0;
        for ( i = 0 ; pWord->paWord[i] ; ++i )
            {
            pszWord = (unsigned char *)pWord->paWord[i];
            pResult = applyKimmoRecognizer(pszWord, pKimmo_in);
            if (pResult != NULL)
                {
                pWord->pAnalyses = convertKimmoPATRToWordAnalyses(
                                         pResult,
                                         pKimmo_in,
                                         pCatPath_g,
                                         pTextControl_in->cDecomp,
                                         pFdDefinitions_g,
                                         pWord->pAnalyses,
                                         &uiAmbiguityCount);
                freeKimmoResult( pResult );
                /*
                 *  adjust output for available fields
                 */
                pWord->iOutputFlags &= ~WANT_FEATURES;
                pWord->iOutputFlags &= ~WANT_CATEGORY;
                for (   pAnal = pWord->pAnalyses ;
                        pAnal ;
                        pAnal = pAnal->pNext )
                    {
                    if (    (pAnal->pszFeatures != NULL) &&
                            (*pAnal->pszFeatures != NUL) )
                        pWord->iOutputFlags |= WANT_FEATURES;
                    if (    (pAnal->pszCategory != NULL) &&
                            (*pAnal->pszCategory != NUL) )
                        pWord->iOutputFlags |= WANT_CATEGORY;
                    }
                }
            }
        }
    writeTemplate(pOutputFP_in, pszOutputFile_in, pWord,
                  pTextControl_in );
    freeWordTemplate( pWord );
    }
}

5.5.5 Source File

`cvtkp2wa.c'

5.6 freePATREdgeList

5.6.1 Syntax

#include "patr.h"

void freePATREdgeList(PATREdgeList * pPATRResult_io,
                      PATRData *     pPATR_io);

5.6.2 Description

freePATREdgeList frees the memory allocated for a parse chart.

The arguments to freePATREdgeList are as follows:

pPATRResult_io
points to a parse chart previously stored by storePATREdgeList.
pPATR_io
points to the data structure that contains the PC-PATR language data and internal memory storage.

5.6.3 Return Value

none

5.6.4 Example See section 5.19 parseWithPATR.

5.6.5 Source File `patalloc.c'

5.7 freePATRFeature

5.7.1 Syntax

#include "patr.h"

void freePATRFeature(PATRFeature * pFeature_io,
                     PATRData * pPATR_io);

5.7.2 Description

freePATRFeature frees the memory allocated for a PC-PATR feature structure.

The arguments to freePATRFeature are as follows:

pFeature_io
points to a feature structure that is no longer needed.
pPATR_io
points to the data structure that contains the PC-PATR language data and internal memory storage.

5.7.3 Return Value

none

5.7.4 Example

#include "patr.h"
...
static void free_word_categs(pwc, pThis)
PATRWordCategory * pwc;
PATRData * pThis;
{
if (pwc)
    {
    freeMemory(pwc->pszCategory);
    freePATRFeature(pwc->pFeature, pThis);
    free_word_categs(pwc->pNext, pThis);
    freeMemory(pwc);
    }
}

5.7.5 Source File

`patalloc.c'

5.8 freePATRGrammar

5.8.1 Syntax

#include "patr.h"

void freePATRGrammar(PATRData * pPATR_io);

5.8.2 Description

freePATRGrammar frees the memory allocated for a PC-PATR grammar.

freePATRGrammar has only one argument:

pPATR_io
points to the data structure that contains the PC-PATR language data such as the grammar rules.

5.8.3 Return Value

none

5.8.4 Example See section 5.20 parseWithPATRLexicon.

5.8.5 Source File `grammar.c'

5.9 freePATRInternalMemory

5.9.1 Syntax

#include "patr.h"

void freePATRInternalMemory(PATRData * pPATR_io);

5.9.2 Description

freePATRInternalMemory frees some memory used internally by various PC-PATR library functions. It should be called only if the grammar and lexicon have already been freed.

freePATRInternalMemory has only one argument:

pPATR_io
points to the data structure that contains the PC-PATR language data and internal memory storage.

5.9.3 Return Value

none

5.9.4 Example See section 5.20 parseWithPATRLexicon.

5.9.5 Source File `patrfunc.c'

5.10 freePATRLexicon

5.10.1 Syntax

#include "patr.h"

void freePATRLexicon(PATRData * pPATR_io);

5.10.2 Description

freePATRLexicon frees the memory allocated for storing the PC-PATR lexicon.

freePATRLexicon has only one argument:

pPATR_io
points to the data structure that contains the PC-PATR language data such as the lexicon.

5.10.3 Return Value

none

5.10.4 Example See section 5.20 parseWithPATRLexicon.

5.10.5 Source File `patrlexi.c'

5.11 loadPATRGrammar

5.11.1 Syntax

#include "patr.h"

int loadPATRGrammar(const char * pszGrammarFile_in,
                    PATRData *   pPATR_io);

5.11.2 Description

loadPATRGrammar loads the PC-PATR grammar from a file into memory. The entire grammar must fit into a single file.

The arguments to loadPATRGrammar are as follows:

pszGrammarFile_in
points to the name of the PC-PATR grammar file.
pPATR_io
points to the data structure that contains the PC-PATR language data such as the grammar.

5.11.3 Return Value

zero if an error occurs while loading the grammar, otherwise a non-zero value

5.11.4 Example See section 5.20 parseWithPATRLexicon.

5.11.5 Source File `grammar.c'

5.12 loadPATRLexicon

5.12.1 Syntax

#include "patr.h"

int loadPATRLexicon(const char * pszLexiconFile_in,
                    PATRData *   pPATR_io);

5.12.2 Description

loadPATRLexicon loads a PC-PATR lexicon file into memory. The lexicon may be spread out across several files, with a separate call to loadPATRLexicon for each file.

The arguments to loadPATRLexicon are as follows:

pszLexiconFile_in
points to the name of a PC-PATR lexicon file.
pPATR_io
points to the data structure that contains the PC-PATR language data such as the lexicon.

5.12.3 Return Value

zero if an error occurs while loading the lexicon, otherwise a non-zero value

5.12.4 Example See section 5.20 parseWithPATRLexicon.

5.12.5 Source File `patrlexi.c'

5.13 loadPATRLexiconFromAmple

5.13.1 Syntax

#include "patr.h"
#include "opaclib.h"

int loadPATRLexiconFromAmple(const char *  pszAnalysisFile_in,
                             TextControl * pTextControl_in,
                             PATRData *    pPATR_io);

5.13.2 Description

loadPATRLexiconFromAmple loads an AMPLE style analysis file into the PC-PATR lexicon in memory. Several such files may be loaded to fill in the lexicon.

The arguments to loadPATRLexiconFromAmple are as follows:

pszAnalysisFile_in
points to the name of an AMPLE style analysis file.
pTextControl_in
points to a data structure that contains the ambiguity and decomposition marker characters.
pPATR_io
points to the data structure that contains the PC-PATR language data such as the lexicon.

5.13.3 Return Value

zero if an error occurs, or a non-zero value if lexicon entries are successfully loaded from the analysis file

5.13.4 Example

#include "patr.h"
#include "opaclib.h"

PATRData        sPATRData_g;
TextControl     sTextControl_g;
...
void processUsingAmple(char * pszGrammar_in, char * pszAnalysis_in,
                       char * pszInput_in, char * pszOutput_in)
{
FILE *  pInputFP;
FILE *  pOutputFP;
char *  pszLine;
int     iSentenceCount;
int     iParseCount;

if (loadPATRGrammar(pszGrammar_in, &sPATRData_g) == 0)
    return;
if (loadPATRLexiconFromAmple(pszAnalysis_in,
                             &sTextControl_g, &sPATRData_g) != 0)
    {
    pInputFP = fopen(pszInput_in, "r");
    if (pInputFP != NULL)
        {
        pOutputFP = fopen(pszOutput_in, "w");
        if (pOutputFP != NULL)
            {
            iSentenceCount = 0;
            while ((pszLine = readLineFromFile(pInputFP,
                                               NULL, '\0')) != NULL)
                {
                ++iSentenceCount;
                iParseCount = parseWithPATRLexicon(pszLine,
                                                   pOutputFP,
                                                   NULL, FALSE,
                                                   &sPATRData_g);
                showAmbiguousProgress(iParseCount, iSentenceCount);
                }
            fclose(pOutputFP);
            }
        fclose(pInputFP);
        }
    freePATRLexicon(&sPATRData_g);
    }
freePATRGrammar(&sPATRData_g);
freePATRInternalMemory(&sPATRData_g);
}

5.13.5 Source File

`patrampl.c'

5.14 markPATRParseGarbage

5.14.1 Syntax

#include "patr.h"

void markPATRParseGarbage(PATRData * pPATR_io);

5.14.2 Description

markPATRParseGarbage sets a garbage collection marker. Since C does not support automatic garbage collection, and the unification algorithm can lose track of allocated feature structures, special work must be done to keep memory from leaking away. This function must be called before calling parseWithPATR, and collectPATRParseGarbage must be called afterwards.

markPATRParseGarbage has only one argument:

pPATR_io
points to the data structure that contains the PC-PATR language data and internal memory storage.

5.14.3 Return Value

none

5.14.4 Example See section 5.19 parseWithPATR.

5.14.5 Source File `patalloc.c'

5.15 parseAmpleSentenceWithPATR

5.15.1 Syntax

#include "patr.h"

int parseAmpleSentenceWithPATR(WordTemplate ** pWords_in,
                               FILE *          pOutputFP_in,
                               char *          pszOutputFile_in,
                               int             bWarnUnusedFd_in,
                               int             bVerbose_in,
                               int             bWriteAmpleParses_in,
                               TextControl *   pTextControl_in,
                               PATRData *      pPATR_in);

5.15.2 Description

parseAmpleSentenceWithPATR tries to parse a sentence loaded from an AMPLE analysis file, possibly disambiguating the morphological analyses as a side-effect. It requires that a PC-PATR grammar be loaded, but does not use the PC-PATR lexicon.

The arguments to parseAmpleSentenceWithPATR are as follows:

pWords_in
points to an NULL terminated array of pointers to word analyses.
pOutputFP_in
is an output FILE pointer.
pszOutputFile_in
points to the name of the output file.
bWarnUnusedFd_in
causes warning messages concerning undefined feature (template) names if TRUE.
bVerbose_in
allows output to the standard error stream (stderr) if TRUE.
bWriteAmpleParses_in
causes the PC-PATR sentence parse trees and feature structures to be written to the output file if TRUE.
pTextControl_in
points to the data structure that contains the ambiguity and decomposition marker characters.
pPATR_in
points to the data structure that contains the PC-PATR language data such as the grammar.

5.15.3 Return Value

the number of successful parses of the sentence

5.15.4 Example

#include "patr.h"
#include "opaclib.h"
...
PATRData        sPATRData_g;
TextControl     sTextControl_g;
...
void disambiguate(char * pszAnalysis_in, char * pszOutput_in)
{
FILE *          pInputFP;
FILE *          pOutputFP;
WordTemplate ** pSentence;
unsigned        uiAmbiguityCount;
unsigned        uiSentenceCount;
unsigned        uiParseCount;

if (sPATRData_g.pGrammar == NULL)
    return;
pInputFP = fopen( pszAnalysis_in, "r");
if (pInputFP == NULL)
    return;
pOutputFP = fopen( pszOutput_in, "w" );
if (pOutputFP == NULL)
    {
    fclose(pInputFP);
    return;
    }
for ( uiSentenceCount = 0, uiParseCount = 0 ;; ++uiSentenceCount )
    {
    pSentence = readSentenceOfTemplates(pInputFP,
                                        pszAnalysis_in,
                                        ".!?",
                                        &sTextControl_g,
                                        sPATRData_g.pLogFP);
    if (pSentence == NULL)
        break;
    uiAmbiguityCount = parseAmpleSentenceWithPATR(
                                        pSentence,
                                        pOutputFP, pszOutput_in,
                                        FALSE, FALSE, TRUE,
                                        &sTextControl_g,
                                        &sPATRData_g);
    if (uiAmbiguityCount != 0)
        ++uiParseCount;
    }
fprintf(stderr,
        "File parsing statistics: %u sentences read, %u parsed\n",
        uiSentenceCount, uiParseCount);
fclose(pInputFP);
fclose(pOutputFP);
}

5.15.5 Source File

`patrampl.c'

5.16 parsePATRFeatureString

5.16.1 Syntax

#include "patr.h"

PATRFeature * parsePATRFeatureString(char *     pszField_in,
                                     PATRData * pPATR_in);

5.16.2 Description

parsePATRFeatureString creates a PC-PATR feature structure from its representation as a set of feature path expressions.

The arguments to parsePATRFeatureString are as follows:

pszField_in
points to the string containing a PC-PATR feature structure represented as a set of feature path expressions.
pPATR_in
points to the data structure that contains the PC-PATR language data such as the feature template definitions.

5.16.3 Return Value

pointer to a PC-PATR feature structure

5.16.4 Example

#include "patr.h"
#include "opaclib.h"
...
PATRData        sPATRData_g;
...
PATRLabeledFeature * extractPATRLabeledFeature(char * pszField_in)
{
char *                  p;
PATRFeature *           pFeature;
PATRLabeledFeature *    pNewFdDef;

p = strpbrk(pszField_in, whiteSpace);
if (p == NULL)
    return( NULL );

*p++ = NUL;
pFeature = parsePATRFeatureString(p, &sPATRData_g);
if (pFeature == NULL)
    return( NULL );

pNewFdDef = (PATRLabeledFeature *)allocMemory(
                                        sizeof(PATRLabeledFeature));
pNewFdDef->pszLabel = duplicateString(pszField_in);
pNewFdDef->pFeature = pFeature;
pNewFdDef->pNext    = NULL;
return( pNewFdDef );
}

5.16.5 Source File

`patrfunc.c'

5.17 parseWithAmpleForPATRLexicon

5.17.1 Syntax

#include "patr.h"
#include "ample.h"

PATRLexItem * parseWithAmpleForPATRLexicon(char *      pszWord_in,
                                           AmpleData * pAmple_in,
                                           PATRData *  pPATR_in);

5.17.2 Description

parseWithAmpleForPATRLexicon parses the word using the AMPLE information already loaded into memory. This provides an alternative to creating a word lexicon file if an AMPLE analysis (with morpheme lexicon files) already exists. It is commonly used as part of a morphological parsing function passed to parseWithPATRLexicon.

The arguments to parseWithAmpleForPATRLexicon are as follows:

pszWord_in
points to the word.
pAmple_in
points to the data structure that contains all the information needed for a morphological parse of the word.
pPATR_in
points to the data structure that contains the PC-PATR language data such as the lexicon.

5.17.3 Return Value

a pointer to the node in the internal lexicon containing the newly parsed word, or NULL if it does not parse.

5.17.4 Example

#include "patr.h"
#include "kimmo.h"
#include "ample.h"
...
PATRData        sPATRData_g;
KimmoData       sKimmoData_g;
AmpleData       sAmpleData_g;
...
static PATRLexItem * tryMorphParse(pszWord_in)
char *  pszWord_in;
{
if (sKimmoData_g.sPATR.pGrammar != NULL)
    {
    sKimmoData_g.pLogFP = sPATRData_g.pLogFP;
    return parseWithKimmoForPATRLexicon( pszWord_in,
                                         &sKimmoData_g,
                                         &sPATRData_g );
    }
else if (sAmpleData_g.pDictionary != NULL)
    {
    sAmpleData_g.pLogFP = sPATRData_g.pLogFP;
    return parseWithAmpleForPATRLexicon( pszWord_in,
                                         &sAmpleData_g,
                                         &sPATRData_g );
    }
else
    return NULL;
}

void parseFile(char * pszInput_in, char * pszOutput_in,
               char * pszLexicon_in)
{
FILE *          pInputFP;
FILE *          pOutputFP;
unsigned        uiLine;

if (sPATRData_g.pGrammar == NULL)
    return;
if (    (sPATRData_g.pLexicon        == NULL) &&
        (sKimmoData_g.sPATR.pGrammar == NULL) &&
        (sAmpleData_g.pDictionary    == NULL) )
    return;
pInputFP = fopen( pszInput_in, "r");
if (pInputFP == NULL)
    return;
pOutputFP = fopen( pszOutput_in, "w" );
if (pOutputFP == FULL)
    {
    fclose(pInputFP);
    return;
    }
for ( uiLine = 1, uiSentences = 0, uiParsed = 0 ;; ++uiSentences )
    {
    pszLine = readLineFromFile(pInputFP, &uiLine,
                               sPATRData_g.cComment);
    if (pszLine == NULL)
        break;
    pszLine += strspn(pszLine, " \t\r\n");
    if (*pszLine == '\0')
        continue;
    trimTrailingWhitespace(pszLine);
    fprintf(pOutputFP, "%s\n", pszLine);
    uiAmbiguityCount = parseWithPATRLexicon(pszLine,
                                            pOutputFP,
                                            tryMorphParse,
                                            TRUE,
                                            &sPATRData_g);
    if (uiAmbiguityCount != 0)
        ++uiParsed;
    }
fprintf(stderr,
        "File parsing statistics: %u sentences read, %u parsed\n",
        uiSentences, uiParsed);
fclose(pInputFP);
fclose(pOutputFP);
/*
 *  save the lexicon entries generated by the morphological parsers
 */
if (pszLexicon_in != NULL)
    {
    pOutputFP = fopen(pszLexicon_in, "w");
    if (pOutputFP != NULL)
        {
        writePATRLexicon(pOutputFP, &sPATRData_g);
        fclose(pOutputFP);
        }
    }
}

5.17.5 Source File

`patrampl.c'

5.18 parseWithKimmoForPATRLexicon

5.18.1 Syntax

#include "patr.h"
#include "kimmo.h"

PATRLexItem * parseWithKimmoForPATRLexicon(char *      pszWord_in,
                                           KimmoData * pKimmo_in,
                                           PATRData *  pPATR_in));

5.18.2 Description

parseWithKimmoForPATRLexicon parses the word using the PC-Kimmo information already loaded into memory. This provides an alternative to creating a word lexicon file if an PC-Kimmo analysis (with morpheme lexicon files) already exists. It is commonly used as part of a morphological parsing function passed to parseWithPATRLexicon.

The arguments to parseWithKimmoForPATRLexicon are as follows:

pszWord_in
points to the word.
pKimmo_in
points to the data structure that contains all the information needed for a morphological parse of the word.
pPATR_in
points to the data structure that contains the PC-PATR language data such as the lexicon.

5.18.3 Return Value

a pointer to the node in the internal lexicon containing the newly parsed word, or NULL if it does not parse.

5.18.4 Example See section 5.17 parseWithAmpleForPATRLexicon.

5.18.5 Source File `parsepwk.c'

5.19 parseWithPATR

5.19.1 Syntax

#include "patr.h"

PATREdgeList * parseWithPATR(PATRWord * pSentence_in,
                             int *      piStage_out,
                             PATRData * pPATR_io);

5.19.2 Description

parseWithPATR is the primary parsing routine in the PC-PATR function library. It is a chart parser with these properties:

  1. bottom-up with top-down filtering
  2. left-to-right order--after each word is added to the chart, all possible edges that can be derived up to that point have been computed as a side-effect
  3. unification of feature structures to constrain the context-free parse.

The arguments to parseWithPATR are as follows:

pSentence_in
points to an ordered list of PATRWord data structures representing a sentence.
piStage_out
points to an integer that provides information about how well the parse actually succeeded. If it is not NULL, the integer it points to is set to one of these values:
  1. Successful.
  2. Turned off unification.
  3. Turned off top-down filtering.
  4. Can only produce "bushes", not an entire parse tree.
  5. Failed to produce anything.
  6. Out of memory.
  7. Out of time.
pPATR_io
points to the data structure that contains the PC-PATR language data such as the grammar.

5.19.3 Return Value

a pointer to the parse chart constructed, or NULL if the parse fails

5.19.4 Example

#include "patr.h"

struct lex_item {
    char *              pszWord;
    char *              pszGloss;
    char *              pszCat;
    unsigned int *      puiFeatures;
    };
...
char **         ppszFeatureNames_g;
PATRData        sPATRData_g;
TRIE *          pLexicon_g;
...
PATREdgeList * parse(char * pszSentence_in)
{
PATREdgeList *          pResult   = NULL;
PATRWord *              pSentence = NULL;
PATRWord *              pNewWord;
PATRWord *              pPrevWord = NULL;
int                     bSaveUnification;
int                     bSaveTopDownFilter;
char *                  pszWord;
struct lex_item *       pLexItem;

if (pszSentence_in == NULL)
    return NULL;
/*
 *  save pointers to temporary parse structures
 */
markPATRParseGarbage(&sPATRData_g);
/*
 *  convert the sentence string to what parseWithPATR() wants
 */
for (   pszWord = strtok(pszSentence_in, " \t\n") ;
        pszWord ;
        pszWord = strtok(NULL, " \t\n") )
    {
    pLexItem = findDataInTRIE(pLexicon_g, pszWord);
    if (pLexItem == NULL)
        {
        reportError(ERROR_MSG,
                    "Cannot find "\%s\" in the lexicon\n",
                    pszWord);
        collectPATRParseGarbage(&sPATRData_g);
        return NULL;
        }
    pNewWord = buildPATRWordForKimmo(pszWord,
                                     pLexItem->pszGloss,
                                     pLexItem->pszCat,
                                     pLexItem->puiFeatures,
                                     ppszFeatureNames_g,
                                     &sPATRData_g);
    if (pPrevWord == NULL)                      /* If first (no prev) */
        pSentence = pNewWord;                   /* Set head to this */
    else
        pPrevWord->pNext = pNewWord;            /* Else link from prev */
    pPrevWord = pNewWord;                       /* Set prev to this */
    }

if (pSentence != NULL)
    {
    /*
     *  parse the word and save a permanent copy of the result
     */
    int iStage;
    pResult = parseWithPATR(pSentence, &iStage, &sPATRData_g);
    if (pResult != NULL)
        {
        pResult = storePATREdgeList(pResult, &sPATRData_g);
        }
    }
/*
 *  Free any temporary parse structures
 */
collectPATRParseGarbage(&sPATRData_g);

return( pResult );
}

void processFile(char * pszFilename_in)
{
char *          pszLine;
FILE *          pInputFP;
PATREdgeList *  pParse;

pInputFP = fopen(pszFilename_in, "r");
if (pInputFP == NULL)
    return;
while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL)
    {
    pParse = parse(pszLine);
    if (pParse != NULL)
        {
        ...
        freePATREdgeList(pParse, &sPATRData_g);
        }
    }
fclose(pInputFP);
}

5.19.5 Source File

`lcparse.c'

5.20 parseWithPATRLexicon

5.20.1 Syntax

#include "patr.h"

int parseWithPATRLexicon(
            char *           pszSentence_in,
            FILE *           pOutputFP_in,
            PATRLexItem * (* pfMorphParser_in)(char * pszWord_in),
            int              bWarnUnusedFd_in,
            PATRData *       pPATR_in);

5.20.2 Description

parseWithPATRLexicon

The arguments to parseWithPATRLexicon are as follows:

pszSentence_in
points to a string containing a sentence to parse. The words must be separated by whitespace characters.
pOutputFP_in
is an output FILE pointer.
pfMorphParser_in
points to a function that has one argument, a character string representing a single word, and returns a pointer to a lexicon entry derived by a morphological parse of the word. If pfMorphParser_in is NULL, then no morphological parsing is done as a backup to the internal PC-PATR lexicon.
bWarnUnusedFd_in
allows warning messages concerning undefined feature (template) names if TRUE.
pPATR_in
points to the data structure that contains the PC-PATR language data such as the grammar and lexicon.

5.20.3 Return Value

the number of valid parses found for the sentence

5.20.4 Example See also section 5.17 parseWithAmpleForPATRLexicon.

#include "patr.h"
#include "opaclib.h"

PATRData        sPATRData_g;
...
void process(char * pszGrammar_in, char * pszLexicon_in,
             char * pszInput_in, char * pszOutput_in)
{
FILE *  pInputFP;
FILE *  pOutputFP;
char *  pszLine;
int     iSentenceCount;
int     iParseCount;

if (loadPATRGrammar(pszGrammar_in, &sPATRData_g) == 0)
    return;
if (loadPATRLexicon(pszLexicon_in, &sPATRData_g) != 0)
    {
    pInputFP = fopen(pszInput_in, "r");
    if (pInputFP != NULL)
        {
        pOutputFP = fopen(pszOutput_in, "w");
        if (pOutputFP != NULL)
            {
            iSentenceCount = 0;
            while ((pszLine = readLineFromFile(pInputFP,
                                               NULL, '\0')) != NULL)
                {
                ++iSentenceCount;
                iParseCount = parseWithPATRLexicon(pszLine,
                                                   pOutputFP,
                                                   NULL,
                                                   FALSE,
                                                   &sPATRData_g);
                showAmbiguousProgress(iParseCount, iSentenceCount);
                }
            fclose(pOutputFP);
            }
        fclose(pInputFP);
        }
    freePATRLexicon(&sPATRData_g);
    }
freePATRGrammar(&sPATRData_g);
freePATRInternalMemory(&sPATRData_g);
}

5.20.5 Source File

`patrlexi.c'

5.21 showPATRLexicon

5.21.1 Syntax

#include "patr.h"

void showPATRLexicon(PATRData * pPATR_in);

5.21.2 Description

showPATRLexicon writes the internal PC-PATR lexicon to the standard output stream (stdout). This is useful only for debugging purposes, if then.

showPATRLexicon has only one argument:

pPATR_in
points to the data structure that contains the PC-PATR language data such as the lexicon.

5.21.3 Return Value

none

5.21.4 Example

#include "patr.h"

PATRData        sPATRData_g;
...
void test_lexicon(char * pszLexicon_in)
{
if (loadPATRLexicon(pszLexicon_in, &sPATRData_g) != 0)
    {
    showPATRLexicon(&sPATRData_g);
    }
}

5.21.5 Source File

`patrlexi.c'

5.22 storePATREdgeList

5.22.1 Syntax

#include "patr.h"

PATREdgeList * storePATREdgeList(PATREdgeList * pPATRResult_in,
                                 PATRData *     pPATR_io);

5.22.2 Description

storePATREdgeList makes a permanent (unaffected by garbage collection) copy of a parse chart. It should be called after parseWithPATR and before collectPATRParseGarbage. Note that freePATREdgeList is used to free the memory allocated by storePATREdgeList.

The arguments to storePATREdgeList are as follows:

pPATRResult_in
points to a parse chart returned by parseWithPATR.
pPATR_io
points to the data structure that contains the PC-PATR language data and internal memory storage.

5.22.3 Return Value

a pointer to a newly allocated copy of the parse chart (PATREdgeList structure)

5.22.4 Example See section 5.19 parseWithPATR.

5.22.5 Source File `patalloc.c'

5.23 storePATRFeature

5.23.1 Syntax

#include "patr.h"

PATRFeature * storePATRFeature(PATRFeature * pFeature_in,
                               PATRData * pPATR_in);

5.23.2 Description

storePATRFeature makes a permanent (unaffected by garbage collection) copy of a feature structure. Note that freePATRFeature is used to free the memory allocated by storePATRFeature.

The arguments to storePATRFeature are as follows:

pFeature_in
points to a feature structure that may be needed beyond the next garbage collection call.
pPATR_io
points to the data structure that contains the PC-PATR language data and internal memory storage.

5.23.3 Return Value

a pointer to a newly allocated copy of the feature structure

5.23.4 Example

/*FIX ME -- THIS NEEDS TO BE WRITTEN!*/

5.23.5 Source File

`patalloc.c'

5.24 stringifyPATRParses

5.24.1 Syntax

#include "patr.h"

int stringifyPATRParses(PATREdgeList * pParses_in,
                        PATRData *     pPATR_in,
                        const char *   pszSentence_in,
                        char **        ppszBuffer_out);

5.24.2 Description

stringifyPATRParses creates a character string representation of a parse chart. The output string contains both the parse trees and the set of features indicated by the settings in the data structure pointed to by pPATR_in.

The arguments to stringifyPATRParses are as follows:

pParses_in
points to a parse chart produced by parseWithPATR.
pPATR_in
points to a data structure that contains the PC-PATR language data and control variables.
pszSentence_in
points to a C string containing the original sentence. It may be NULL.
ppszBuffer_out
points to a pointer which will contain either NULL or the address of dynamically allocated memory containing the character string representation of the parse chart.

5.24.3 Return Value

-1 if an error occurs, or 0 if successful

5.24.4 Example

#include "patr.h"

struct lex_item {
    char *              pszWord;
    char *              pszGloss;
    char *              pszCat;
    unsigned int *      puiFeatures;
    };
...
char **         ppszFeatureNames_g;
PATRData        sPATRData_g;
TRIE *          pLexicon_g;
...
char * parse(char * pszSentence_in)
{
PATREdgeList *    pResult   = NULL;
PATRWord *        pSentence = NULL;
PATRWord *        pNewWord;
PATRWord *        pPrevWord = NULL;
int               bSaveUnification;
int               bSaveTopDownFilter;
char *            pszWord;
struct lex_item * pLexItem;
char *            pszResult = NULL;

if (pszSentence_in == NULL)
    return NULL;
/*
 *  save pointers to temporary parse structures
 */
markPATRParseGarbage(&sPATRData_g);
/*
 *  convert the sentence string to what parseWithPATR() wants
 */
for (   pszWord = strtok(pszSentence_in, " \t\n") ;
        pszWord ;
        pszWord = strtok(NULL, " \t\n") )
    {
    pLexItem = findDataInTRIE(pLexicon_g, pszWord);
    if (pLexItem == NULL)
        {
        reportError(ERROR_MSG,
                    "Cannot find "\%s\" in the lexicon\n",
                    pszWord);
        collectPATRParseGarbage(&sPATRData_g);
        return NULL;
        }
    pNewWord = buildPATRWordForKimmo(pszWord,
                                     pLexItem->pszGloss,
                                     pLexItem->pszCat,
                                     pLexItem->puiFeatures,
                                     ppszFeatureNames_g,
                                     &sPATRData_g);
    if (pPrevWord == NULL)                      /* If first (no prev) */
        pSentence = pNewWord;                   /* Set head to this */
    else
        pPrevWord->pNext = pNewWord;            /* Else link from prev */
    pPrevWord = pNewWord;                       /* Set prev to this */
    }

if (pSentence != NULL)
    {
    /*
     *  parse the word and save a permanent copy of the result
     */
    int iStage;
    pResult = parseWithPATR(pSentence, &iStage, &sPATRData_g);
    if (pResult != NULL)
        {
        stringifyPATRParses(pResult, &sPATRData_g, NULL, pszResult);
        }
    }
/*
 *  Free any temporary parse structures
 */
collectPATRParseGarbage(&sPATRData_g);

return pszResult;
}

void processFile(char * pszFilename_in)
{
char *  pszLine;
FILE *  pInputFP;
char *  pszParse;

pInputFP = fopen(pszFilename_in, "r");
if (pInputFP == NULL)
    return;
while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL)
    {
    pszParse = parse(pszLine);
    if (pszParse != NULL)
        {
        ...
        freeMemory(pszParse);
        }
    }
fclose(pInputFP);
}

5.24.5 Source File

`patrstrg.c'

5.25 writePATRLexicon

5.25.1 Syntax

#include "patr.h"

void writePATRLexicon(FILE *     pOutputFP_in,
                      PATRData * pPATR_in);

5.25.2 Description

writePATRLexicon writes the internal PC-PATR lexicon to a file in a form suitable for reloading with loadPATRLexicon. This is most useful when a morphological parser is used to populate the lexicon. See section 5.17 parseWithAmpleForPATRLexicon, section 5.18 parseWithKimmoForPATRLexicon, and section 5.20 parseWithPATRLexicon.

The arguments to writePATRLexicon are as follows:

pOutputFP_in
is an output FILE pointer.
pPATR_in
points to the data structure that contains the PC-PATR language data such as the lexicon.

5.25.3 Return Value

none

5.25.4 Example See section 5.17 parseWithAmpleForPATRLexicon.

5.25.5 Source File `patrlexi.c'

5.26 writePATRParses

5.26.1 Syntax

#include "patr.h"

void writePATRParses(PATREdgeList * pParses_in,
                     FILE *         pOutputFP_in,
                     PATRData *     pPATR_in);

5.26.2 Description

writePATRParses writes the parse trees and associated features from the parse chart pointed to by pParses_in. How many parse trees are written, and how they are displayed, is controlled by pPATR_in->iMaxAmbiguities and pPATR_in->eTreeDisplay. The bits in pPATR_in->iFeatureDisplay control which features are written, and how they are displayed in the output file.

The arguments to writePATRParses are as follows:

pParses_in
points to a parse chart produced by parseWithPATR.
pOutputFP_in
is an output FILE pointer.
pPATR_in
points to a data structure that contains the PC-PATR language data and control variables.

5.26.3 Return Value

none

5.26.4 Example

#include "patr.h"

struct lex_item {
    char *              pszWord;
    char *              pszGloss;
    char *              pszCat;
    unsigned int *      puiFeatures;
    };
...
char **         ppszFeatureNames_g;
PATRData        sPATRData_g;
TRIE *          pLexicon_g;
...
void parse(char * pszSentence_in, FILE * pOutputFP_in)
{
PATREdgeList *          pResult   = NULL;
PATRWord *              pSentence = NULL;
PATRWord *              pNewWord;
PATRWord *              pPrevWord = NULL;
int                     bSaveUnification;
int                     bSaveTopDownFilter;
char *                  pszWord;
struct lex_item *       pLexItem;
unsigned                uiParseCount = 0;

if ((pszSentence_in == NULL) || (pOutputFP_in == NULL))
    return;
fprintf(pOutputFP_in, "%s\n", pszSentence_in);
/*
 *  save pointers to temporary parse structures
 */
markPATRParseGarbage(&sPATRData_g);
/*
 *  convert the sentence string to what parseWithPATR() wants
 */
for (   pszWord = strtok(pszSentence_in, " \t\n") ;
        pszWord ;
        pszWord = strtok(NULL, " \t\n") )
    {
    pLexItem = findDataInTRIE(pLexicon_g, pszWord);
    if (pLexItem == NULL)
        {
        reportError(ERROR_MSG,
                    "Cannot find "\%s\" in the lexicon\n",
                    pszWord);
        collectPATRParseGarbage(&sPATRData_g);
        return;
        }
    pNewWord = buildPATRWordForKimmo(pszWord,
                                     pLexItem->pszGloss,
                                     pLexItem->pszCat,
                                     pLexItem->puiFeatures,
                                     ppszFeatureNames_g,
                                     &sPATRData_g);
    if (pPrevWord == NULL)                      /* If first (no prev) */
        pSentence = pNewWord;                   /* Set head to this */
    else
        pPrevWord->pNext = pNewWord;            /* Else link from prev */
    pPrevWord = pNewWord;                       /* Set prev to this */
    }

if (pSentence != NULL)
    {
    /*
     *  parse the word and save a permanent copy of the result
     */
    int iStage;
    const char * psz = NULL;
    PATREdgeList * pel;

    pResult = parseWithPATR(pSentence, &iStage, &sPATRData_g);
    if (iStage != 0)
        fprintf(pOutputFP_in,
                "**** Cannot parse this sentence ****\n");
    switch (iStage)
        {
        case 0:
            for ( pel = pResult ; pel ; pel = pel->pNext )
                ++uiParseCount;
            break;
        case 1:
            psz = "**** Turning off unification ****\n";
            break;
        case 2:
            psz = "**** Turning off top-down filtering ****\n";
            break;
        case 3:
            psz = "**** Building the largest parse \"bush\" ****\n";
            break;
        case 4:
            psz = "**** No output available ****\n";
            break;
        case 5:
            psz = "**** Out of Memory (after %lu edges) ****\n";
            break;
        case 6:
            psz = "**** Out of Time (after %lu edges) ****\n";
            break;
        }
    if (psz)
        fprintf(pOutputFP_in, psz, pPATR_in->uiEdgesAdded);
    if (pResult)
        {
        writePATRParses(pResult, pOutputFP_in, pPATR_in);
        putc('\n', pOutputFP_in);
        }
    }
else
    {
    fprintf(pOutputFP_in, "**** Nothing to parse ****\n");
    }
/*
 *  Free any temporary parse structures
 */
collectPATRParseGarbage(&sPATRData_g);
}

void processFile(char * pszInput_in, char * pszOutput_in)
{
char *  pszLine;
FILE *  pInputFP;
FILE *  pOutputFP;

pInputFP = fopen(pszInput_in, "r");
if (pInputFP == NULL)
    return;
pOutputFP = fopen(pszInput_in, "r");
if (pOutputFP == NULL)
    {
    fclose(pInputFP);
    return;
    }
while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL)
    {
    parse(pszLine, pOutputFP);
    }
fclose(pInputFP);
fclose(pOutputFP);
}

5.26.5 Source File

`userpatr.c'

5.27 writePATRStyledOutput

5.27.1 Syntax

#include "patr.h"

void writePATRStyledOutput(PATREdgeList *    pParses_in,
                           char *            pszWord_in,
                           char *            pszLex_in,
                           char *            pszGloss_in,
                           FILE *            pOutputFP_in,
                           PATRFeatureTags * pFeatTags_in,
                           char *            pszParseStartTag_in,
                           char *            pszParseEndTag_in,
                           PATRData *        pPATR_in,
                           unsigned *        puiAmbigCount_io);

5.27.2 Description

writePATRStyledOutput writes the parse trees and associated features from the parse chart pointed to by pParses_in in a highly stylized fashion. (It was written for KTAGGER and may not be useful for any other purpose.)

The arguments to writePATRStyledOutput are as follows:

pParses_in
points to a parse chart produced by parseWithPATR. Each parse tree is written as the value of the <TREE> feature referenced by pFeatTags_in, and its top level feature structure is written as the value of the <FEAT> feature referenced by pFeatTags_in.
pszWord_in
points to the word (or sentence) that was parsed by parseWithPATR. It is written as the value of the <WORD> feature referenced by pFeatTags_in.
pszLex_in
points to a concatenated string of morphemes (words) found in the word (sentence). It is written as the value of the <LEX> feature referenced by pFeatTags_in.
pszGloss_in
points to a concatenated string of glosses for the morphemes (words) found in the word (sentence). It is written as the value of the <GLOSS> feature referenced by pFeatTags_in.
pOutputFP_in
is an output FILE pointer.
pFeatTags_in
points to a list of data structures containing feature paths with associated start and end tags. Feature paths that do not match one of the five special values (<TREE>, <FEAT>, <WORD>, <LEX>, or <GLOSS>) are matched against the top level feature structure associated with the current parse.
pszParseStartTag_in
points to a string used to mark the beginning of a parse in the output file.
pszParseEndTag_in
points to a string to mark the end of a parse in the output file.
pPATR_in
points to a data structure that contains the PC-PATR language data and control variables.
puiAmbigCount_io
points to an unsigned integer that counts the number of parses pointed to by pParses_in. The number is added to by writePATRStyledOutput.

5.27.3 Return Value

none

5.27.4 Example

#include <stdio.h>
#include <string.h>
#include "patr.h"
#include "kimmo.h"
#include "opaclib.h"
...
KimmoData               sKimmoData_g;
PATRFeatureTags *       pFeatureTags_g;
...
void process(char * pszInput_in, char * pszOutput_in)
{
char *          pszLine;
char *          pszWord;
KimmoResult *   pKimmoResults;
KimmoResult *   pResult;
char *          pszMorphGlosses = NULL;
char *          pszMorphLexes = NULL;
unsigned        uiAmbiguityCount;
unsigned        uiDotsCount = 0;
FILE *          pInputFP;
FILE *          pOutputFP;

pInputFP = fopen(pszInput_in, "r");
if (pInputFP == NULL)
    return;
pOutputFP = fopen(pszOutput_in, "w");
if (pOutputFP == NULL)
    {
    fclose(pInputFP);
    return;
    }
while ((pszLine = readLineFromFile(pInputFP, NULL, 0)) != NULL)
    {
    pszWord = strspn(pszLine, " \t\r\n\f");
    if (*pszWord == '\0')
        continue;
    trimTrailingWhitespace(pszWord);
    fprintf(pOutputFP, "<word>\n");

    pKimmoResults = applyKimmoRecognizer((unsigned char *)pszWord,
                                         &sKimmoData_g);

    for (   pResult = pKimmoResults, uiAmbiguityCount = 0 ;
            pResult ;
            pResult = pResult->pNext )
        {
        pszMorphLexes   = (char *)concatKimmoMorphLexemes(
                                                pResult->pAnalysis,
                                                "",
                                                &sKimmoData_g);
        pszMorphGlosses = (char *)concatKimmoMorphGlosses(
                                                pResult->pAnalysis,
                                                "",
                                                &sKimmoData_g);
        writePATRStyledOutput(pResult->pParseChart,
                              pszWord,
                              pszMorphLexes,
                              pszMorphGlosses,
                              pOutputFP,
                              pFeatureTags_g,
                              "<parse>", "</parse>",
                              &sKimmoData_g.sPATR,
                              &uiAmbiguityCount);
        fprintf(pOutputFP, "\n");
        freeMemory(pszMorphLexes);
        freeMemory(pszMorphGlosses);
        }
    if (pKimmoResults == NULL)
        fprintf(pOutputFP, "<parse>*** %s ***</parse>\n", pszWord);
    else
        freeKimmoResult( pKimmoResults );
    fprintf(pOutputFP, "</word>\n");
    }
fclose(pInputFP);
fclose(pOutputFP);
}

5.27.5 Source File

`wrtstyle.c'


This document was generated on 20 March 2003 using texi2html 1.56k.