PC-PATR Function Library Reference Manual functions for unification based parsing version 1.2 January 2000 by Stephen McConnel Copyright (C) 2000 SIL International Published by: Language Software Development SIL International 7500 W. Camp Wisdom Road Dallas, TX 75236 U.S.A. Permission is granted to make and distribute verbatim copies of this file provided the copyright notice and this permission notice are preserved in all copies. The author may be reached at the address above or via email as `steve@acadcomp.sil.org'. Introduction to the PC-PATR function library ******************************************** PC-PATR is an implementation for personal computers of the PATR-II computational linguistic formalism. The PATR-II formalism can be viewed as a computer language for encoding linguistic information. It does not presuppose any particular theory of syntax. It was originally developed by Stuart M. Shieber at Stanford University in the early 1980's. A PATR-II grammar consists of a set of rules and a lexicon. Each rule consists of a context-free _phrase structure rule_ and a set of _feature constraints_, that is, _unifications_ on the _feature structures_ associated with the constituents of the phrase structure rules. The lexicon provides the items that can replace the terminal symbols of the phrase structure rules, that is, the words of the language together with their relevant features. This function library contains the processing functions used by PC-PATR and related programs. It has been developed with the goal of making it easier to cast PATR-II style parsing into different frameworks. The first use of this library has been to add a morphotactic component to PC-Kimmo consisting of a PC-PATR word parser. PC-PATR (and thus this function library) is still under development. The author would appreciate feedback directed to the following address: Stephen McConnel (972)708-7361 (office) Language Software Development (972)708-7561 (fax) SIL International 7500 W. Camp Wisdom Road Dallas, TX 75236 steve@acadcomp.sil.org U.S.A. or Stephen_McConnel@Sil.org Variable and function naming conventions **************************************** The basic goal behind choosing names in the PC-PATR function library is for the name to convey information about what it represents. This is achieved in two ways: striving for a descriptive name rather than a short cryptic abbreviated name, and following a different pattern of capitalization for each type of name. Preprocessor macro names ======================== Preprocessor macro names are written entirely in capital letters. If the name requires more than one word for an adequate description, the words are joined together with intervening underscore (`_') characters. Data structure names ==================== Data structure names consist of one or more capitalized words. If the name requires more than one word for an adequate description, the words are joined together without underscores, depending on the capitalization pattern to make them readable as separate words. Variable names ============== Variable names in the PC-PATR function library follow a modified form of the Hungarian naming convention described by Steve McConnell in his book `Code Complete' on pages 202-206. Variable names have three parts: a lowercase type prefix, a descriptive name, and a scope suffix. Type prefix ----------- The type prefix has the following basic possibilities: `b' a Boolean, usually encoded as a `char', `short', or `int' `c' a character, usually a `char' but sometimes a `short' or `int' `d' a double precision floating point number, that is, a `double' `e' an enumeration, encoded as an `enum' or as a `char', `short', or `int' `i' an integer, that is, an `int', `short', `long', or (rarely) `char' `s' a data structure defined by a `struct' statement `sz' a NUL (that is, zero) terminated character string `pf' a pointer to a function In addition, the basic types may be prefixed by these qualifiers: `u' indicates that an integer or a character is unsigned `a' indicates an array of the basic type `p' indicates a pointer to the type, possibly a pointer to an array or to a pointer Descriptive name ---------------- The descriptive name portion of a variable name consists of one or more capitalized words concatenated together. There are no underscores (`_') separating these words from each other, or from the type prefix. For the PC-PATR function library, the descriptive name for global variables begins with PATR. Scope suffix ------------ The scope suffix has these possibilities: `_g' indicates a global variable accessible throughout the program `_m' indicates a module (semiglobal) variable accessible throughout the file (declared `static') `_in' indicates a function argument used for input `_out' indicates a function argument used for output (must be a pointer) `_io' indicates a function argument used for both input and output (must be a pointer) `_s' indicates a function variable that retains its value between calls (declared `static') The lack of a scope suffix indicates that a variable is declared within a function and exists on the stack for the duration of the current call. Function names ============== Global function names in the PC-PATR function library have two parts: a verb that is all lowercase followed by a noun phrase containing one or more capitalized words. These pieces are concatanated without any intervening underscores (`_'). For the PC-PATR library functions, the noun phrase section includes PATR. Examples ======== Given the discussion above, it is easy to discern at a glance what type of item each of the following names refers to. `SAMPLE_NAME' is a preprocessor macro. `SampleName' is a data structure. `pSampleName' is a local pointer variable. `writeSampleName' is a function (that may apply to a data structure named `SampleName'). PC-PATR data structures *********************** The PC-PATR functions operate on a number of different data structures. The most important of these are described in the following sections. The PC-PATR functions also use a number of other data structures internally, but it should not be necessary for a programmer to manipulate them directly. PATRData ======== Definition ---------- /* * forward declarations of internal PATR data types */ typedef struct patr_grammar PATRGrammar; typedef struct patr_lexicon PATRLexicon; typedef struct { char bFailure; char bUnification; char eTreeDisplay; char bGloss; char bGlossesExist; char iFeatureDisplay; char bCheckCycles; char bTopDownFilter; short iMaxAmbiguities; short iDebugLevel; char cComment; char bSilent; char bShowWarnings; char bPromoteDefAtoms; time_t iMaxProcTime; FILE * pLogFP; char * pszGrammarFile; PATRGrammar * pGrammar; char * pszRecordMarker; char * pszWordMarker; char * pszGlossMarker; char * pszCategoryMarker; char * pszFeatureMarker; PATRLexicon * pLexicon; int iCurrentIndex; int iParseCount; } PATRData; Description ----------- The `PATRData' data structure collects the information used for data processing within the PC-PATR functions. Its general purpose is to reduce the number of parameters needed by the various functions. `bFailure' causes parser failures to be preserved and displayed if `TRUE' (nonzero). `bUnification' enables unification while parsing if `TRUE' (nonzero). If `FALSE', the parser acts only as a context free chart parser, which usually produces much more ambiguous output. `eTreeDisplay' is the tree display mode, one of these symbolic constant values: `PATR_NO_TREE' prevents any output of parse trees. `PATR_FLAT_TREE' displays parse trees as parenthesized, nested lists. For example, (S (NP (N cows))(VP (VerbalP (V eat))(NP (N grass)))) `PATR_FULL_TREE' displays parse trees with a text representation of the tree structure. For example, S _____|_____ NP VP | ___|____ N VerbalP NP cows | | V N eat grass `PATR_INDENTED_TREE' displays parse trees in an indented (outline) fashion. For example, S NP N cows VP VerbalP V eat NP N grass `bGloss' causes glosses (if they exist) to be displayed if `TRUE'. `bGlossesExist' is set automatically according to whether or not glosses exist when the lexicon is loaded. `iFeatureDisplay' is a bit vector that encodes the feature display mode: `iFeatureDisplay & PATR_FEATURE_ON' allows the output of feature structures. If this bit is cleared (zero), feature structures are not written to the output, but may still be used in parsing. `iFeatureDisplay & PATR_FEATURE_FLAT' causes feature output to be "flattened" into a compact form which is less readable for humans, but just as easily parsed by a computer program. If this bit is cleared, a top level feature structure looks like this in the output: S: [ cat: S head: [ agr: $1[ 3sg: - ] finite:+ pos: V vform: BASE ] subj: [ cat: NP head: [ agr: $1 case: NOM number:PL pos: N proper:- verbal:- ] ] ] On the other hand, if this bit is set, the same feature structure would be written like this: S: [ cat:S head:[ agr:$1[ 3sg:- ] finite:+ pos:V vform:BASE ] subj:[ cat:NP head:[ agr:$1 case:NOM number:PL pos:N proper:- verbal:- ] ] ] `iFeatureDisplay & PATR_FEATURE_ALL' causes all of the feature structures to be written to the output, not just the top level feature structure. If this bit is cleared, the feature structure output associated with a parse might look like one of the previous two examples. If this bit is set, the output might look like the following instead: S_1: [ cat:S head:[ agr:$1[ 3sg:- ] finite:+ pos:V vform:BASE ] subj:[ cat:NP head:[ agr:$1 case:NOM number:PL pos:N proper:- verbal:- ] ] ] NP_2: [ cat:NP head:[ agr:[ 3sg:- ] number:PL pos:N proper:- verbal:- ] ] N_3: [ cat:N gloss:`cow head:[ agr:[ 3sg:- ] number:PL pos:N proper:- verbal:- ] lex:cows root_pos:N ] VP_4: [ cat:VP head:[ finite:+ pos:V vform:BASE ] ] VerbalP_5: [ cat:VerbalP head:[ finite:+ pos:V vform:BASE ] ] V_6: [ cat:V gloss:`eat head:[ pos:V vform:BASE ] lex:eat root_pos:V ] NP_7: [ cat:NP head:[ agr:[ 3sg:+ ] number:SG pos:N proper:- verbal:- ] ] N_8: [ cat:N gloss:`grass head:[ agr:[ 3sg:+ ] number:SG pos:N proper:- verbal:- ] lex:grass root_pos:N ] `iFeatureDisplay & PATR_FEATURE_TRIM' prevents empty feature structures from being written to the output. If this bit is cleared, then a feature structure might look like this: VP_3: [ cat: VP head: [ form: finite trans: [ pred: sleep arg1: $1[] arg2: [] ] ] syncat: [ first: [ cat: NP head: [ agreement: [ person:third number:singular ] trans: $1 ] ] rest: end ] ] If this bit is set, the same data structure would look like this: VP_3: [ cat: VP head: [ form: finite trans: [ pred: sleep ] ] syncat: [ first: [ cat: NP head: [ agreement: [ person:third number:singular ] ] ] rest: end ] ] `bCheckCycles' determines whether to enable checking for parse cycles while parsing. `bTopDownFilter' determines whether to enable top down filtering while parsing. `iMaxAmbiguities' is the maximum number of alternative parse trees to show in the output. `iDebugLevel' is the degree of debugging output desired (0 means none). `cComment' is the character that begins a comment in an input line. (`PATR_DEFAULT_COMMENT' is a symbol for the default value.) `bSilent' determines whether to disable messages to the "standard error" stream (`stderr'). `bShowWarnings' determines whether to enable warnings as well as error messages. `bPromoteDefAtoms' determines whether default atomic values in features loaded from the lexicon are "promoted" to ordinary atomic values before parsing. (This can affect feature unification since a conflicting default value does not cause a failure: the default value merely disappears.) `iMaxProcTime' determines the maximum number of seconds a parse is allowed to take. A value of `0' means no limit. `pLogFP' is the `FILE' pointer for an output log file (`NULL' means none). `pszGrammarFile' points to the name of the current PC-PATR grammar file (`NULL' means none). `pGrammar' points to the current PC-PATR grammar data (`NULL' means none). `pszRecordMarker' points to the standard format marker for lexicon records. `pszWordMarker' points to the standard format marker for lexicon word fields. `pszGlossMarker' points to the standard format marker for lexicon gloss fields. `pszCategoryMarker' points to the standard format marker for lexicon category fields. `pszFeatureMarker' points to the standard format marker for lexicon feature fields. `pLexicon' points to the current PC-PATR lexicon (`NULL' means none). `iCurrentIndex' is used for internal processing. It records the index number of the current edge. `iParseCount' is used for internal processing. It records the number of parses found. Source File ----------- `patr.h' PATREdgeList ============ Definition ---------- /* * forward declaration of an internal PATR data type */ typedef struct patr_edge PATREdge; typedef struct patr_edge_list { PATREdge * pEdge; struct patr_edge_list * pNext; } PATREdgeList; Description ----------- The `PATREdgeList' data structure encodes a list of parse results returned by the PC-PATR parsing functions. `pEdge' points to a parse tree encoded as an edge in the parse chart. `pNext' points to the next parse tree encoded as an edge in the parse chart. Source File ----------- `patr.h' PATRFeatureTags =============== Definition ---------- #include "strlist.h" typedef struct patr_feat_tags { StringList * pFeaturePath; char * pszStartTag; char * pszEndTag; struct patr_feat_tags * pNext; } PATRFeatureTags; Description ----------- The `PATRFeatureTags' data structure contains information needed to write feature structures to an output file in a stylized fashion. `pFeaturePath' points to a feature path encoded as a list of feature label strings. `pszStartTag' points to the text string written to the output file before the given feature value. `pszEndTag' points to the text string written to the output file after the given feature value. `pNext' points to another `PATRFeatureTags' data structure. This facilitates building a list of such items. Source File ----------- `patr.h' PATRLabeledFeature ================== Definition ---------- /* * forward declaration of an internal PATR data type */ typedef struct patr_feature PATRFeature; typedef struct patr_labeled_feat { char * pszLabel; PATRFeature * pFeature; struct patr_labeled_feat * pNext; } PATRLabeledFeature; Description ----------- The `PATRLabeledFeature' data structure contains information needed to abbreviate a feature structure to a simple label (template name) while writing an output file. `pszLabel' points to the label (template name) associated with the `pFeature' value `pFeature' points to a feature structure `pNext' points to another `PATRLabeledFeature' data structure. This facilitates building a list of such items. Source File ----------- `patr.h' PATRWord ======== Definition ---------- /* * forward declaration of an internal PATR data type */ typedef struct patr_categ PATRWordCategory; typedef struct patr_word { int iWordNumber; char * pszWordName; PATRWordCategory * pCategories; struct patr_word * pNext; } PATRWord; Description ----------- The `PATRWord' data structure represents a single word of the sentence fed to the PC-PATR parsing function. A sentence is represented by a linked list of these data structures. `iWordNumber' is the number of the word in the sentence. `pszWordName' is the orthographic wordform. `pCategories' points to a list of word categories for this word. (This allows words to be syntactically ambiguous.) Each word category contains the feature structure associated with one sense of the word. `pNext' points to the next word in the sentence. `NULL' marks the end of the sentence. Source File ----------- `patr.h' The PC-PATR function library global variables ********************************************* This chapter gives the proper usage information about each of the global variables found in the PC-PATR function library. The `patr.h' header file contains the extern declarations for all of these variables. bCancelPATROperation_g ====================== Syntax ------ #include "patr.h" extern int bCancelPATROperation_g; Description ----------- `bCancelPATROperation_g' can be set asynchronously to interrupt a PC-PATR parse that seems to be stuck. Example ------- #include #include "patr.h" ... void sigint_handler(int iSignal_in) { bCancelPATROperation_g = TRUE; signal(SIGINT, sigint_handler); } ... signal(SIGINT, sigint_handler); ... Source File ----------- `patrdata.c' cPATRPatchSep_g =============== Syntax ------ #include "patr.h" extern const char cPATRPatchSep_g; Description ----------- `cPATRPatchSep_g' is used to separate the revision and patch level values when printing the PC-PATR version number. `'a'' indicates an alpha release, `'b'' indicates a beta release, and `'.'' indicates a production release. Example ------- See the example for `iPATRVersion_g' below. Source File ----------- `patrdata.c' iPATRPatchlevel_g ================= Syntax ------ #include "patr.h" extern const int iPATRPatchlevel_g; Description ----------- `iPATRPatchlevel_g' is the current "patch level" of the PC-PATR function library and program. This is the third level version number, reflecting bug fixes or internal improvements that should be functionally invisible to users. Example ------- See the example for `iPATRVersion_g' below. Source File ----------- `patrdata.c' iPATRRevision_g =============== Syntax ------ #include "patr.h" extern const int iPATRRevision_g; Description ----------- `iPATRRevision_g' is the current "revision level" of the PC-PATR function library and program. This is the second level version number, reflecting changes to program behavior that require changes to the `PC-PATR Reference Manual'. Example ------- See the example for `iPATRVersion_g' below. Source File ----------- `patrdata.c' iPATRVersion_g ============== Syntax ------ #include "patr.h" extern const int iPATRVersion_g; Description ----------- `iPATRVersion_g' is the current "version" number of the PC-PATR function library and program. This is the top level version number, reflecting a major rewrite of the program or major changes that make it incompatible with earlier versions of the program. Example ------- #include #include "patr.h" ... fprintf(stderr, "PC-PATR version %d.%d%c%d (%s), Copyright %s SIL\n", iPATRVersion_g, iPATRRevision_g, cPATRPatchSep_g, iPATRPatchlevel_g, pszPATRDate_g, pszPATRYear_g); #ifdef __DATE__ fprintf(stderr, pszPATRCompileFormat_g, pszPATRCompileDate_g, pszPATRCompileTime_g); #else if (pszPATRTestVersion_g != NULL) fputs(pszPATRTestVersion_g, stderr); #endif ... Source File ----------- `patrdata.c' pszPATRCompileDate_g ==================== Syntax ------ #include "patr.h" #ifdef __DATE__ extern const char * pszPATRCompileDate_g; #endif Description ----------- `pszPATRCompileDate_g' points to a string containing the date on which the PC-PATR library was compiled. It exists only if the C compiler preprocessor supports the `__DATE__' constant. Example ------- See the example for `iPATRVersion_g' above. Source File ----------- `patrdata.c' pszPATRCompileFormat_g ====================== Syntax ------ #include "patr.h" #ifdef __DATE__ #ifdef __TIME__ extern const char * pszPATRCompileFormat_g; #endif #endif Description ----------- `pszPATRCompileFormat_g' points to a `printf' style format string suitable for displaying `pszPATRCompileDate_g' and `pszPATRCompileTime_g'. It exists only if the C compiler preprocessor supports the `__DATE__' and `__TIME__' constants. Example ------- See the example for `iPATRVersion_g' above. Source File ----------- `patrdata.c' pszPATRCompileTime_g ==================== Syntax ------ #include "patr.h" #ifdef __TIME__ extern const char * pszPATRCompileTime_g; #endif Description ----------- `pszPATRCompileTime_g' points to a string containing the time at which the PC-PATR library was compiled. It exists only if the C compiler preprocessor supports the `__TIME__' constant. Example ------- See the example for `iPATRVersion_g' above. Source File ----------- `patrdata.c' pszPATRDate_g ============= Syntax ------ #include "patr.h" extern const char * pszPATRDate_g; Description ----------- `pszPATRDate_g' points to a string containing the date on which the PC-PATR library was last modified. Example ------- See the example for `iPATRVersion_g' above. Source File ----------- `patrdata.c' pszPATRTestVersion_g ==================== Syntax ------ #include "patr.h" #ifndef __DATE__ extern const char * pszPATRTestVersion_g; #endif Description ----------- `pszPATRTestVersion_g' points to a string describing the test status of PC-PATR (either alpha or beta). If this is a production release version, it is set to `NULL'. It is defined only if the C compiler preprocessor does not support the `__DATE__' constant. Example ------- See the example for `iPATRVersion_g' above. Source File ----------- `patrdata.c' pszPATRYear_g ============= Syntax ------ #include "patr.h" extern const char * pszPATRYear_g; Description ----------- `pszPATRYear_g' points to a string containing the year in which the PC-PATR library was last modified. This is suitable for a copyright notice assigning the copyright to SIL International. Example ------- See the example for `iPATRVersion_g' above. Source File ----------- `patrdata.c' PC-PATR functions ***************** This document gives the proper usage information about each of the functions found in the PC-PATR function library. The prototypes and type definitions relevent to the use of these functions are all found in the `patr.h' header file. addPATRLexItem ============== Syntax ------ #include "patr.h" void addPATRLexItem(char * pszWord_in, char * pszGloss_in, char * pszCategory_in, char * pszFeatures_in, PATRFeature * pFeature_in, PATRData * pPATR_io); Description ----------- `addPATRLexItem' adds one entry to the PC-PATR lexicon stored in memory. The arguments to `addPATRLexItem' are as follows: `pszWord_in' points to the orthographic string of the lexical item. `pszGloss_in' points to a gloss string for the lexical item. `pszCategory_in' points to the syntactic category for the lexical item. `pszFeatures_in' points to a space delimited list of feature (template) names associated with the lexical item. `pFeature_in' points to a feature structure associated with the lexical item. This is an alternative to `pszFeatures_in'. `pPATR_io' points to the data structure that contains the PC-PATR language data such as the lexicon. Return Value ------------ none Example ------- #include #include "patr.h" #include "opaclib.h" void storeTemplate(WordTemplate * pTemplate_in, PATRData * pPATR_in) { WordAnalysis * pAnal; char * pszFeatures; char * p; for ( pAnal = pTemplate_in->pAnalyses ; pAnal ; pAnal = pAnal->pNext ) { pszFeatures = NULL; if (pAnal->pszFeatures != NULL) { pszFeatures = duplicateString(pAnal->pszFeatures); while ((p = strchr(pszFeatures, '=')) != NULL) *p = ' '; } addPATRLexItem(pTemplate_in->pszSurfaceForm, pAnal->pszAnalysis, pAnal->pszCategory, pszFeatures, NULL, pPATR_in); if (pszFeatures != NULL) { freeMemory(pszFeatures); pszFeatures = NULL; } } } Source File ----------- `patrlexi.c' buildPATRWord ============= Syntax ------ #include "patr.h" PATRWord * buildPATRWord(char * pszLex_in, char * pszGloss_in, char * pszCat_in, char * pszFeatures_in, PATRFeature * pPATRFeature_in, PATRData * pPATR_in); Description ----------- `buildPATRWord' converts the given information into the form needed for a PATR parse. This is used by the (X)AMPLE program in preparing a proposed word analysis for parsing with a word grammar. The arguments to `buildPATRWord' are as follows: `pszLex_in' contains the lexical form of the morpheme. It must not be `NULL'. `pszGloss_in' contains a short gloss of the morpheme. It may be `NULL'. `pszCat_in' contains the category of the morpheme. It must not be `NULL'. `pszFeatures_in' contains zero or more feature template names separated by spaces or equal signs (`='). It may be `NULL'. `pPATRFeature_in' is reserved for future use. It may be `NULL'. `pPATR_in' points to the data structure that contains the PC-PATR language data such as the grammar (which includes template definitions). It must not be `NULL'. Return Value ------------ a pointer to a dynamically allocation PATRWord data structure containing the morpheme information Example ------- #include "ample.h" /* #includes "patr.h" */ #include "ampledef.h" ... PATREdgeList * perform_word_parse(pAnal_in, pAmple_in) AmpleHeadList * pAnal_in; AmpleData * pAmple_in; { AmpleHeadList * hp; AmpleAllomorph * ap; AmpleMorpheme * mp; PATRWord * pWord = NULL; PATRWord * pNewMorph = NULL; char * pszLex; char * pszGloss; char * pszPATRCategory; char * pszFromCat; char * pszToCat; char * pszProps; char * pszFeatures; /* * Convert the list of morphemes to what parseWithPATR() wants. */ for ( hp = pAnal_in ; hp ; hp = hp->pLeft ) { ap = hp->pAllomorph; if (ap == NULL) continue; /* should never happen */ mp = ap->pMorpheme; if (mp == NULL) continue; /* should never happen */ if (mp->pszUnderForm != NULL) pszLex = mp->pszUnderForm; else pszLex = ap->pszAllomorph; if ((pszLex == NULL) || (*pszLex == NUL)) pszLex = "0"; pszGloss = mp->pszMorphName; if (mp->pszPATRCat != NULL) { pszPATRCategory = mp->pszPATRCat; } else { switch (hp->eType) { case AMPLE_PFX: pszPATRCategory = "Prefix"; break; case AMPLE_IFX: if ( (hp->pRight == NULL) || (hp->pRight->eType == AMPLE_SFX) ) pszPATRCategory = "Suffix"; else pszPATRCategory = "Prefix"; break; case AMPLE_SFX: pszPATRCategory = "Suffix"; break; default: pszPATRCategory = "Root"; break; } } if (hp->eType == AMPLE_ROOT) pszFromCat = NULL; else pszFromCat = findAmpleCategoryName(get_from(hp), pAmple_in->pCategories); pszToCat = findAmpleCategoryName(get_to(hp), pAmple_in->pCategories); pszProps = build_prop_string(ap->sPropertySet, &pAmple_in->sProperties); pszFeatures = build_feature_string(mp->pszMorphFd, pszFromCat, pszToCat, pszProps ? pszProps : ""); if (pszProps != NULL) freeMemory(pszProps); pNewMorph = buildPATRWord(pszLex, pszGloss, pszPATRCategory, pszFeatures, mp->pPATRFeature, &pAmple_in->sPATR); freeMemory(pszFeatures); pNewMorph->pNext = pWord; pWord = pNewMorph; } ... } Source File ----------- `patalloc.c' buildPATRWordForKimmo ===================== Syntax ------ #include "patr.h" PATRWord * buildPATRWordForKimmo(char * pszLex_in, char * pszGloss_in, char * pszCat_in, unsigned short * puiFeatIndexes_in, char ** ppszFeatures_in, PATRData * pPATR_in); Description ----------- `buildPATRWordForKimmo' converts the supplied information into the form needed to apply a PC-PATR analysis. The arguments to `buildPATRWordForKimmo' are as follows: `pszLex_in' points to the lexical form. `pszGloss_in' points to a gloss string. `pszCat_in' points to the grammatical category. `puiFeatIndexes_in' points to an array of feature (template) name indexes for features associated with this item. `ppszFeatures_in' points to the array of feature (template) names indexed by the members of `puiFeatIndexes_in'. `pPATR_in' points to the data structure that contains the PC-PATR language data such as the grammar. Return Value ------------ a pointer to a dynamically allocated `PATRWord' data structure encoding the supplied information Example ------- See the example for `parseWithPATR' below. Source File ----------- `patrkimm.c' collectPATRParseGarbage ======================= Syntax ------ #include "patr.h" void collectPATRParseGarbage(PATRData * pPATR_io); Description ----------- `collectPATRParseGarbage' cleans up the memory used by `parseWithPATR'. If the parse results are wanted for an extended period of time, then `storePATREdgeList' must be called after `parseWithPATR' and before `collectPATRParseGarbage' `collectPATRParseGarbage' has only one argument: `pPATR_io' points to the data structure that contains the PC-PATR language data and internal memory storage. Return Value ------------ none Example ------- See the example for `parseWithPATR' below. Source File ----------- `patalloc.c' convertKimmoPATRToWordAnalyses ============================== Syntax ------ #include "patr.h" #include "kimmo.h" WordAnalysis * convertKimmoPATRToWordAnalyses( KimmoResult * pKimmoResult_in, KimmoData * pKimmo_in, StringList * pCategoryPath_in, int cDecomp_in, PATRLabeledFeature * pFdDefinitions_in, WordAnalysis * pAnalyses_io, unsigned * puiAmbigCount_io, PATRData * pPATR_io); Description ----------- `convertKimmoPATRToWordAnalyses' converts the result of a PC-Kimmo analysis into a form suitable for output via the `writeTemplate' function. This is part of the PATR function library rather than the Kimmo library because it requires fiddling with feature structures internal to the PATR library. The arguments to `convertKimmoPATRToWordAnalyses' are as follows: `pKimmoResult_in' points to a list of analyses returned by `applyKimmoRecognizer'. `pKimmo_in' points to the Kimmo data used by `applyKimmoRecognizer'. `pCategoryPath_in' points to the feature path used to find the word category in the top level feature structure associated with each Kimmo analysis. `cDecomp_in' is the character used to separate the morphemes in a word decomposition string. `pFdDefinitions_in' points to the set of mappings from a feature structure to a set of feature names. `pAnalyses_io' points to the set of analyses that have already been converted. `puiAmbigCount_io' points to a counter that stores the number of distinct analyses in the output. Return Value ------------ a pointer to a list of word analysis data structures Example ------- #include #include "patr.h" #include "kimmo.h" ... PATRLabeledFeature * pFdDefinitions_g = NULL; ... static void analyzeFile(FILE * pInputFP_in, FILE * pOutputFP_in, char * pszOutputFile_in, TextControl * pTextControl_in, KimmoData * pKimmo_in) { WordTemplate * pWord; WordAnalysis * pAnal; KimmoResult * pResult; unsigned uiAmbiguityCount; unsigned char * pszWord; size_t i; while ((pWord = readTemplateFromText(pInputFP_in, pTextControl_in)) != NULL) { pWord->iOutputFlags = WANT_DECOMPOSITION | WANT_ORIGINAL; if (pWord->paWord != NULL) { uiAmbiguityCount = 0; for ( i = 0 ; pWord->paWord[i] ; ++i ) { pszWord = (unsigned char *)pWord->paWord[i]; pResult = applyKimmoRecognizer(pszWord, pKimmo_in); if (pResult != NULL) { pWord->pAnalyses = convertKimmoPATRToWordAnalyses( pResult, pKimmo_in, pCatPath_g, pTextControl_in->cDecomp, pFdDefinitions_g, pWord->pAnalyses, &uiAmbiguityCount); freeKimmoResult( pResult ); /* * adjust output for available fields */ pWord->iOutputFlags &= ~WANT_FEATURES; pWord->iOutputFlags &= ~WANT_CATEGORY; for ( pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext ) { if ( (pAnal->pszFeatures != NULL) && (*pAnal->pszFeatures != NUL) ) pWord->iOutputFlags |= WANT_FEATURES; if ( (pAnal->pszCategory != NULL) && (*pAnal->pszCategory != NUL) ) pWord->iOutputFlags |= WANT_CATEGORY; } } } } writeTemplate(pOutputFP_in, pszOutputFile_in, pWord, pTextControl_in ); freeWordTemplate( pWord ); } } Source File ----------- `cvtkp2wa.c' freePATREdgeList ================ Syntax ------ #include "patr.h" void freePATREdgeList(PATREdgeList * pPATRResult_io, PATRData * pPATR_io); Description ----------- `freePATREdgeList' frees the memory allocated for a parse chart. The arguments to `freePATREdgeList' are as follows: `pPATRResult_io' points to a parse chart previously stored by `storePATREdgeList'. `pPATR_io' points to the data structure that contains the PC-PATR language data and internal memory storage. Return Value ------------ none Example ------- See the example for `parseWithPATR' below. Source File ----------- `patalloc.c' freePATRFeature =============== Syntax ------ #include "patr.h" void freePATRFeature(PATRFeature * pFeature_io, PATRData * pPATR_io); Description ----------- `freePATRFeature' frees the memory allocated for a PC-PATR feature structure. The arguments to `freePATRFeature' are as follows: `pFeature_io' points to a feature structure that is no longer needed. `pPATR_io' points to the data structure that contains the PC-PATR language data and internal memory storage. Return Value ------------ none Example ------- #include "patr.h" ... static void free_word_categs(pwc, pThis) PATRWordCategory * pwc; PATRData * pThis; { if (pwc) { freeMemory(pwc->pszCategory); freePATRFeature(pwc->pFeature, pThis); free_word_categs(pwc->pNext, pThis); freeMemory(pwc); } } Source File ----------- `patalloc.c' freePATRGrammar =============== Syntax ------ #include "patr.h" void freePATRGrammar(PATRData * pPATR_io); Description ----------- `freePATRGrammar' frees the memory allocated for a PC-PATR grammar. `freePATRGrammar' has only one argument: `pPATR_io' points to the data structure that contains the PC-PATR language data such as the grammar rules. Return Value ------------ none Example ------- See the example for `parseWithPATRLexicon' below. Source File ----------- `grammar.c' freePATRInternalMemory ====================== Syntax ------ #include "patr.h" void freePATRInternalMemory(PATRData * pPATR_io); Description ----------- `freePATRInternalMemory' frees some memory used internally by various PC-PATR library functions. It should be called only if the grammar and lexicon have already been freed. `freePATRInternalMemory' has only one argument: `pPATR_io' points to the data structure that contains the PC-PATR language data and internal memory storage. Return Value ------------ none Example ------- See the example for `parseWithPATRLexicon' below. Source File ----------- `patrfunc.c' freePATRLexicon =============== Syntax ------ #include "patr.h" void freePATRLexicon(PATRData * pPATR_io); Description ----------- `freePATRLexicon' frees the memory allocated for storing the PC-PATR lexicon. `freePATRLexicon' has only one argument: `pPATR_io' points to the data structure that contains the PC-PATR language data such as the lexicon. Return Value ------------ none Example ------- See the example for `parseWithPATRLexicon' below. Source File ----------- `patrlexi.c' loadPATRGrammar =============== Syntax ------ #include "patr.h" int loadPATRGrammar(const char * pszGrammarFile_in, PATRData * pPATR_io); Description ----------- `loadPATRGrammar' loads the PC-PATR grammar from a file into memory. The entire grammar must fit into a single file. The arguments to `loadPATRGrammar' are as follows: `pszGrammarFile_in' points to the name of the PC-PATR grammar file. `pPATR_io' points to the data structure that contains the PC-PATR language data such as the grammar. Return Value ------------ zero if an error occurs while loading the grammar, otherwise a non-zero value Example ------- See the example for `parseWithPATRLexicon' below. Source File ----------- `grammar.c' loadPATRLexicon =============== Syntax ------ #include "patr.h" int loadPATRLexicon(const char * pszLexiconFile_in, PATRData * pPATR_io); Description ----------- `loadPATRLexicon' loads a PC-PATR lexicon file into memory. The lexicon may be spread out across several files, with a separate call to `loadPATRLexicon' for each file. The arguments to `loadPATRLexicon' are as follows: `pszLexiconFile_in' points to the name of a PC-PATR lexicon file. `pPATR_io' points to the data structure that contains the PC-PATR language data such as the lexicon. Return Value ------------ zero if an error occurs while loading the lexicon, otherwise a non-zero value Example ------- See the example for `parseWithPATRLexicon' below. Source File ----------- `patrlexi.c' loadPATRLexiconFromAmple ======================== Syntax ------ #include "patr.h" #include "opaclib.h" int loadPATRLexiconFromAmple(const char * pszAnalysisFile_in, TextControl * pTextControl_in, PATRData * pPATR_io); Description ----------- `loadPATRLexiconFromAmple' loads an AMPLE style analysis file into the PC-PATR lexicon in memory. Several such files may be loaded to fill in the lexicon. The arguments to `loadPATRLexiconFromAmple' are as follows: `pszAnalysisFile_in' points to the name of an AMPLE style analysis file. `pTextControl_in' points to a data structure that contains the ambiguity and decomposition marker characters. `pPATR_io' points to the data structure that contains the PC-PATR language data such as the lexicon. Return Value ------------ zero if an error occurs, or a non-zero value if lexicon entries are successfully loaded from the analysis file Example ------- #include "patr.h" #include "opaclib.h" PATRData sPATRData_g; TextControl sTextControl_g; ... void processUsingAmple(char * pszGrammar_in, char * pszAnalysis_in, char * pszInput_in, char * pszOutput_in) { FILE * pInputFP; FILE * pOutputFP; char * pszLine; int iSentenceCount; int iParseCount; if (loadPATRGrammar(pszGrammar_in, &sPATRData_g) == 0) return; if (loadPATRLexiconFromAmple(pszAnalysis_in, &sTextControl_g, &sPATRData_g) != 0) { pInputFP = fopen(pszInput_in, "r"); if (pInputFP != NULL) { pOutputFP = fopen(pszOutput_in, "w"); if (pOutputFP != NULL) { iSentenceCount = 0; while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL) { ++iSentenceCount; iParseCount = parseWithPATRLexicon(pszLine, pOutputFP, NULL, FALSE, &sPATRData_g); showAmbiguousProgress(iParseCount, iSentenceCount); } fclose(pOutputFP); } fclose(pInputFP); } freePATRLexicon(&sPATRData_g); } freePATRGrammar(&sPATRData_g); freePATRInternalMemory(&sPATRData_g); } Source File ----------- `patrampl.c' markPATRParseGarbage ==================== Syntax ------ #include "patr.h" void markPATRParseGarbage(PATRData * pPATR_io); Description ----------- `markPATRParseGarbage' sets a garbage collection marker. Since C does not support automatic garbage collection, and the unification algorithm can lose track of allocated feature structures, special work must be done to keep memory from leaking away. This function must be called before calling `parseWithPATR', and `collectPATRParseGarbage' must be called afterwards. `markPATRParseGarbage' has only one argument: `pPATR_io' points to the data structure that contains the PC-PATR language data and internal memory storage. Return Value ------------ none Example ------- See the example for `parseWithPATR' below. Source File ----------- `patalloc.c' parseAmpleSentenceWithPATR ========================== Syntax ------ #include "patr.h" int parseAmpleSentenceWithPATR(WordTemplate ** pWords_in, FILE * pOutputFP_in, char * pszOutputFile_in, int bWarnUnusedFd_in, int bVerbose_in, int bWriteAmpleParses_in, TextControl * pTextControl_in, PATRData * pPATR_in); Description ----------- `parseAmpleSentenceWithPATR' tries to parse a sentence loaded from an AMPLE analysis file, possibly disambiguating the morphological analyses as a side-effect. It requires that a PC-PATR grammar be loaded, but does not use the PC-PATR lexicon. The arguments to `parseAmpleSentenceWithPATR' are as follows: `pWords_in' points to an `NULL' terminated array of pointers to word analyses. `pOutputFP_in' is an output `FILE' pointer. `pszOutputFile_in' points to the name of the output file. `bWarnUnusedFd_in' causes warning messages concerning undefined feature (template) names if `TRUE'. `bVerbose_in' allows output to the standard error stream (`stderr') if `TRUE'. `bWriteAmpleParses_in' causes the PC-PATR sentence parse trees and feature structures to be written to the output file if `TRUE'. `pTextControl_in' points to the data structure that contains the ambiguity and decomposition marker characters. `pPATR_in' points to the data structure that contains the PC-PATR language data such as the grammar. Return Value ------------ the number of successful parses of the sentence Example ------- #include "patr.h" #include "opaclib.h" ... PATRData sPATRData_g; TextControl sTextControl_g; ... void disambiguate(char * pszAnalysis_in, char * pszOutput_in) { FILE * pInputFP; FILE * pOutputFP; WordTemplate ** pSentence; unsigned uiAmbiguityCount; unsigned uiSentenceCount; unsigned uiParseCount; if (sPATRData_g.pGrammar == NULL) return; pInputFP = fopen( pszAnalysis_in, "r"); if (pInputFP == NULL) return; pOutputFP = fopen( pszOutput_in, "w" ); if (pOutputFP == NULL) { fclose(pInputFP); return; } for ( uiSentenceCount = 0, uiParseCount = 0 ;; ++uiSentenceCount ) { pSentence = readSentenceOfTemplates(pInputFP, pszAnalysis_in, ".!?", &sTextControl_g, sPATRData_g.pLogFP); if (pSentence == NULL) break; uiAmbiguityCount = parseAmpleSentenceWithPATR( pSentence, pOutputFP, pszOutput_in, FALSE, FALSE, TRUE, &sTextControl_g, &sPATRData_g); if (uiAmbiguityCount != 0) ++uiParseCount; } fprintf(stderr, "File parsing statistics: %u sentences read, %u parsed\n", uiSentenceCount, uiParseCount); fclose(pInputFP); fclose(pOutputFP); } Source File ----------- `patrampl.c' parsePATRFeatureString ====================== Syntax ------ #include "patr.h" PATRFeature * parsePATRFeatureString(char * pszField_in, PATRData * pPATR_in); Description ----------- `parsePATRFeatureString' creates a PC-PATR feature structure from its representation as a set of feature path expressions. The arguments to `parsePATRFeatureString' are as follows: `pszField_in' points to the string containing a PC-PATR feature structure represented as a set of feature path expressions. `pPATR_in' points to the data structure that contains the PC-PATR language data such as the feature template definitions. Return Value ------------ pointer to a PC-PATR feature structure Example ------- #include "patr.h" #include "opaclib.h" ... PATRData sPATRData_g; ... PATRLabeledFeature * extractPATRLabeledFeature(char * pszField_in) { char * p; PATRFeature * pFeature; PATRLabeledFeature * pNewFdDef; p = strpbrk(pszField_in, whiteSpace); if (p == NULL) return( NULL ); *p++ = NUL; pFeature = parsePATRFeatureString(p, &sPATRData_g); if (pFeature == NULL) return( NULL ); pNewFdDef = (PATRLabeledFeature *)allocMemory( sizeof(PATRLabeledFeature)); pNewFdDef->pszLabel = duplicateString(pszField_in); pNewFdDef->pFeature = pFeature; pNewFdDef->pNext = NULL; return( pNewFdDef ); } Source File ----------- `patrfunc.c' parseWithAmpleForPATRLexicon ============================ Syntax ------ #include "patr.h" #include "ample.h" PATRLexItem * parseWithAmpleForPATRLexicon(char * pszWord_in, AmpleData * pAmple_in, PATRData * pPATR_in); Description ----------- `parseWithAmpleForPATRLexicon' parses the word using the AMPLE information already loaded into memory. This provides an alternative to creating a word lexicon file if an AMPLE analysis (with morpheme lexicon files) already exists. It is commonly used as part of a morphological parsing function passed to `parseWithPATRLexicon'. The arguments to `parseWithAmpleForPATRLexicon' are as follows: `pszWord_in' points to the word. `pAmple_in' points to the data structure that contains all the information needed for a morphological parse of the word. `pPATR_in' points to the data structure that contains the PC-PATR language data such as the lexicon. Return Value ------------ a pointer to the node in the internal lexicon containing the newly parsed word, or `NULL' if it does not parse. Example ------- #include "patr.h" #include "kimmo.h" #include "ample.h" ... PATRData sPATRData_g; KimmoData sKimmoData_g; AmpleData sAmpleData_g; ... static PATRLexItem * tryMorphParse(pszWord_in) char * pszWord_in; { if (sKimmoData_g.sPATR.pGrammar != NULL) { sKimmoData_g.pLogFP = sPATRData_g.pLogFP; return parseWithKimmoForPATRLexicon( pszWord_in, &sKimmoData_g, &sPATRData_g ); } else if (sAmpleData_g.pDictionary != NULL) { sAmpleData_g.pLogFP = sPATRData_g.pLogFP; return parseWithAmpleForPATRLexicon( pszWord_in, &sAmpleData_g, &sPATRData_g ); } else return NULL; } void parseFile(char * pszInput_in, char * pszOutput_in, char * pszLexicon_in) { FILE * pInputFP; FILE * pOutputFP; unsigned uiLine; if (sPATRData_g.pGrammar == NULL) return; if ( (sPATRData_g.pLexicon == NULL) && (sKimmoData_g.sPATR.pGrammar == NULL) && (sAmpleData_g.pDictionary == NULL) ) return; pInputFP = fopen( pszInput_in, "r"); if (pInputFP == NULL) return; pOutputFP = fopen( pszOutput_in, "w" ); if (pOutputFP == FULL) { fclose(pInputFP); return; } for ( uiLine = 1, uiSentences = 0, uiParsed = 0 ;; ++uiSentences ) { pszLine = readLineFromFile(pInputFP, &uiLine, sPATRData_g.cComment); if (pszLine == NULL) break; pszLine += strspn(pszLine, " \t\r\n"); if (*pszLine == '\0') continue; trimTrailingWhitespace(pszLine); fprintf(pOutputFP, "%s\n", pszLine); uiAmbiguityCount = parseWithPATRLexicon(pszLine, pOutputFP, tryMorphParse, TRUE, &sPATRData_g); if (uiAmbiguityCount != 0) ++uiParsed; } fprintf(stderr, "File parsing statistics: %u sentences read, %u parsed\n", uiSentences, uiParsed); fclose(pInputFP); fclose(pOutputFP); /* * save the lexicon entries generated by the morphological parsers */ if (pszLexicon_in != NULL) { pOutputFP = fopen(pszLexicon_in, "w"); if (pOutputFP != NULL) { writePATRLexicon(pOutputFP, &sPATRData_g); fclose(pOutputFP); } } } Source File ----------- `patrampl.c' parseWithKimmoForPATRLexicon ============================ Syntax ------ #include "patr.h" #include "kimmo.h" PATRLexItem * parseWithKimmoForPATRLexicon(char * pszWord_in, KimmoData * pKimmo_in, PATRData * pPATR_in)); Description ----------- `parseWithKimmoForPATRLexicon' parses the word using the PC-Kimmo information already loaded into memory. This provides an alternative to creating a word lexicon file if an PC-Kimmo analysis (with morpheme lexicon files) already exists. It is commonly used as part of a morphological parsing function passed to `parseWithPATRLexicon'. The arguments to `parseWithKimmoForPATRLexicon' are as follows: `pszWord_in' points to the word. `pKimmo_in' points to the data structure that contains all the information needed for a morphological parse of the word. `pPATR_in' points to the data structure that contains the PC-PATR language data such as the lexicon. Return Value ------------ a pointer to the node in the internal lexicon containing the newly parsed word, or `NULL' if it does not parse. Example ------- See the example for `parseWithAmpleForPATRLexicon' above. Source File ----------- `parsepwk.c' parseWithPATR ============= Syntax ------ #include "patr.h" PATREdgeList * parseWithPATR(PATRWord * pSentence_in, int * piStage_out, PATRData * pPATR_io); Description ----------- `parseWithPATR' is the primary parsing routine in the PC-PATR function library. It is a chart parser with these properties: 1. bottom-up with top-down filtering 2. left-to-right order-after each word is added to the chart, all possible edges that can be derived up to that point have been computed as a side-effect 3. unification of feature structures to constrain the context-free parse. The arguments to `parseWithPATR' are as follows: `pSentence_in' points to an ordered list of `PATRWord' data structures representing a sentence. `piStage_out' points to an integer that provides information about how well the parse actually succeeded. If it is not `NULL', the integer it points to is set to one of these values: 0. Successful. 1. Turned off unification. 2. Turned off top-down filtering. 3. Can only produce "bushes", not an entire parse tree. 4. Failed to produce anything. 5. Out of memory. 6. Out of time. `pPATR_io' points to the data structure that contains the PC-PATR language data such as the grammar. Return Value ------------ a pointer to the parse chart constructed, or `NULL' if the parse fails Example ------- #include "patr.h" struct lex_item { char * pszWord; char * pszGloss; char * pszCat; unsigned int * puiFeatures; }; ... char ** ppszFeatureNames_g; PATRData sPATRData_g; TRIE * pLexicon_g; ... PATREdgeList * parse(char * pszSentence_in) { PATREdgeList * pResult = NULL; PATRWord * pSentence = NULL; PATRWord * pNewWord; PATRWord * pPrevWord = NULL; int bSaveUnification; int bSaveTopDownFilter; char * pszWord; struct lex_item * pLexItem; if (pszSentence_in == NULL) return NULL; /* * save pointers to temporary parse structures */ markPATRParseGarbage(&sPATRData_g); /* * convert the sentence string to what parseWithPATR() wants */ for ( pszWord = strtok(pszSentence_in, " \t\n") ; pszWord ; pszWord = strtok(NULL, " \t\n") ) { pLexItem = findDataInTRIE(pLexicon_g, pszWord); if (pLexItem == NULL) { reportError(ERROR_MSG, "Cannot find "\%s\" in the lexicon\n", pszWord); collectPATRParseGarbage(&sPATRData_g); return NULL; } pNewWord = buildPATRWordForKimmo(pszWord, pLexItem->pszGloss, pLexItem->pszCat, pLexItem->puiFeatures, ppszFeatureNames_g, &sPATRData_g); if (pPrevWord == NULL) /* If first (no prev) */ pSentence = pNewWord; /* Set head to this */ else pPrevWord->pNext = pNewWord; /* Else link from prev */ pPrevWord = pNewWord; /* Set prev to this */ } if (pSentence != NULL) { /* * parse the word and save a permanent copy of the result */ int iStage; pResult = parseWithPATR(pSentence, &iStage, &sPATRData_g); if (pResult != NULL) { pResult = storePATREdgeList(pResult, &sPATRData_g); } } /* * Free any temporary parse structures */ collectPATRParseGarbage(&sPATRData_g); return( pResult ); } void processFile(char * pszFilename_in) { char * pszLine; FILE * pInputFP; PATREdgeList * pParse; pInputFP = fopen(pszFilename_in, "r"); if (pInputFP == NULL) return; while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL) { pParse = parse(pszLine); if (pParse != NULL) { ... freePATREdgeList(pParse, &sPATRData_g); } } fclose(pInputFP); } Source File ----------- `lcparse.c' parseWithPATRLexicon ==================== Syntax ------ #include "patr.h" int parseWithPATRLexicon( char * pszSentence_in, FILE * pOutputFP_in, PATRLexItem * (* pfMorphParser_in)(char * pszWord_in), int bWarnUnusedFd_in, PATRData * pPATR_in); Description ----------- `parseWithPATRLexicon' The arguments to `parseWithPATRLexicon' are as follows: `pszSentence_in' points to a string containing a sentence to parse. The words must be separated by whitespace characters. `pOutputFP_in' is an output `FILE' pointer. `pfMorphParser_in' points to a function that has one argument, a character string representing a single word, and returns a pointer to a lexicon entry derived by a morphological parse of the word. If `pfMorphParser_in' is `NULL', then no morphological parsing is done as a backup to the internal PC-PATR lexicon. `bWarnUnusedFd_in' allows warning messages concerning undefined feature (template) names if `TRUE'. `pPATR_in' points to the data structure that contains the PC-PATR language data such as the grammar and lexicon. Return Value ------------ the number of valid parses found for the sentence Example ------- See also the example for `parseWithAmpleForPATRLexicon' above. #include "patr.h" #include "opaclib.h" PATRData sPATRData_g; ... void process(char * pszGrammar_in, char * pszLexicon_in, char * pszInput_in, char * pszOutput_in) { FILE * pInputFP; FILE * pOutputFP; char * pszLine; int iSentenceCount; int iParseCount; if (loadPATRGrammar(pszGrammar_in, &sPATRData_g) == 0) return; if (loadPATRLexicon(pszLexicon_in, &sPATRData_g) != 0) { pInputFP = fopen(pszInput_in, "r"); if (pInputFP != NULL) { pOutputFP = fopen(pszOutput_in, "w"); if (pOutputFP != NULL) { iSentenceCount = 0; while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL) { ++iSentenceCount; iParseCount = parseWithPATRLexicon(pszLine, pOutputFP, NULL, FALSE, &sPATRData_g); showAmbiguousProgress(iParseCount, iSentenceCount); } fclose(pOutputFP); } fclose(pInputFP); } freePATRLexicon(&sPATRData_g); } freePATRGrammar(&sPATRData_g); freePATRInternalMemory(&sPATRData_g); } Source File ----------- `patrlexi.c' showPATRLexicon =============== Syntax ------ #include "patr.h" void showPATRLexicon(PATRData * pPATR_in); Description ----------- `showPATRLexicon' writes the internal PC-PATR lexicon to the standard output stream (`stdout'). This is useful only for debugging purposes, if then. `showPATRLexicon' has only one argument: `pPATR_in' points to the data structure that contains the PC-PATR language data such as the lexicon. Return Value ------------ none Example ------- #include "patr.h" PATRData sPATRData_g; ... void test_lexicon(char * pszLexicon_in) { if (loadPATRLexicon(pszLexicon_in, &sPATRData_g) != 0) { showPATRLexicon(&sPATRData_g); } } Source File ----------- `patrlexi.c' storePATREdgeList ================= Syntax ------ #include "patr.h" PATREdgeList * storePATREdgeList(PATREdgeList * pPATRResult_in, PATRData * pPATR_io); Description ----------- `storePATREdgeList' makes a permanent (unaffected by garbage collection) copy of a parse chart. It should be called after `parseWithPATR' and before `collectPATRParseGarbage'. Note that `freePATREdgeList' is used to free the memory allocated by `storePATREdgeList'. The arguments to `storePATREdgeList' are as follows: `pPATRResult_in' points to a parse chart returned by `parseWithPATR'. `pPATR_io' points to the data structure that contains the PC-PATR language data and internal memory storage. Return Value ------------ a pointer to a newly allocated copy of the parse chart (PATREdgeList structure) Example ------- See the example for `parseWithPATR' above. Source File ----------- `patalloc.c' storePATRFeature ================ Syntax ------ #include "patr.h" PATRFeature * storePATRFeature(PATRFeature * pFeature_in, PATRData * pPATR_in); Description ----------- `storePATRFeature' makes a permanent (unaffected by garbage collection) copy of a feature structure. Note that `freePATRFeature' is used to free the memory allocated by `storePATRFeature'. The arguments to `storePATRFeature' are as follows: `pFeature_in' points to a feature structure that may be needed beyond the next garbage collection call. `pPATR_io' points to the data structure that contains the PC-PATR language data and internal memory storage. Return Value ------------ a pointer to a newly allocated copy of the feature structure Example ------- /*FIX ME -- THIS NEEDS TO BE WRITTEN!*/ Source File ----------- `patalloc.c' stringifyPATRParses =================== Syntax ------ #include "patr.h" int stringifyPATRParses(PATREdgeList * pParses_in, PATRData * pPATR_in, const char * pszSentence_in, char ** ppszBuffer_out); Description ----------- `stringifyPATRParses' creates a character string representation of a parse chart. The output string contains both the parse trees and the set of features indicated by the settings in the data structure pointed to by `pPATR_in'. The arguments to `stringifyPATRParses' are as follows: `pParses_in' points to a parse chart produced by `parseWithPATR'. `pPATR_in' points to a data structure that contains the PC-PATR language data and control variables. `pszSentence_in' points to a C string containing the original sentence. It may be `NULL'. `ppszBuffer_out' points to a pointer which will contain either `NULL' or the address of dynamically allocated memory containing the character string representation of the parse chart. Return Value ------------ `-1' if an error occurs, or `0' if successful Example ------- #include "patr.h" struct lex_item { char * pszWord; char * pszGloss; char * pszCat; unsigned int * puiFeatures; }; ... char ** ppszFeatureNames_g; PATRData sPATRData_g; TRIE * pLexicon_g; ... char * parse(char * pszSentence_in) { PATREdgeList * pResult = NULL; PATRWord * pSentence = NULL; PATRWord * pNewWord; PATRWord * pPrevWord = NULL; int bSaveUnification; int bSaveTopDownFilter; char * pszWord; struct lex_item * pLexItem; char * pszResult = NULL; if (pszSentence_in == NULL) return NULL; /* * save pointers to temporary parse structures */ markPATRParseGarbage(&sPATRData_g); /* * convert the sentence string to what parseWithPATR() wants */ for ( pszWord = strtok(pszSentence_in, " \t\n") ; pszWord ; pszWord = strtok(NULL, " \t\n") ) { pLexItem = findDataInTRIE(pLexicon_g, pszWord); if (pLexItem == NULL) { reportError(ERROR_MSG, "Cannot find "\%s\" in the lexicon\n", pszWord); collectPATRParseGarbage(&sPATRData_g); return NULL; } pNewWord = buildPATRWordForKimmo(pszWord, pLexItem->pszGloss, pLexItem->pszCat, pLexItem->puiFeatures, ppszFeatureNames_g, &sPATRData_g); if (pPrevWord == NULL) /* If first (no prev) */ pSentence = pNewWord; /* Set head to this */ else pPrevWord->pNext = pNewWord; /* Else link from prev */ pPrevWord = pNewWord; /* Set prev to this */ } if (pSentence != NULL) { /* * parse the word and save a permanent copy of the result */ int iStage; pResult = parseWithPATR(pSentence, &iStage, &sPATRData_g); if (pResult != NULL) { stringifyPATRParses(pResult, &sPATRData_g, NULL, pszResult); } } /* * Free any temporary parse structures */ collectPATRParseGarbage(&sPATRData_g); return pszResult; } void processFile(char * pszFilename_in) { char * pszLine; FILE * pInputFP; char * pszParse; pInputFP = fopen(pszFilename_in, "r"); if (pInputFP == NULL) return; while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL) { pszParse = parse(pszLine); if (pszParse != NULL) { ... freeMemory(pszParse); } } fclose(pInputFP); } Source File ----------- `patrstrg.c' writePATRLexicon ================ Syntax ------ #include "patr.h" void writePATRLexicon(FILE * pOutputFP_in, PATRData * pPATR_in); Description ----------- `writePATRLexicon' writes the internal PC-PATR lexicon to a file in a form suitable for reloading with `loadPATRLexicon'. This is most useful when a morphological parser is used to populate the lexicon. See the descriptions of `parseWithAmpleForPATRLexicon' `parseWithKimmoForPATRLexicon', and `parseWithPATRLexicon' above. The arguments to `writePATRLexicon' are as follows: `pOutputFP_in' is an output `FILE' pointer. `pPATR_in' points to the data structure that contains the PC-PATR language data such as the lexicon. Return Value ------------ none Example ------- See the example for `parseWithAmpleForPATRLexicon' above. Source File ----------- `patrlexi.c' writePATRParses =============== Syntax ------ #include "patr.h" void writePATRParses(PATREdgeList * pParses_in, FILE * pOutputFP_in, PATRData * pPATR_in); Description ----------- `writePATRParses' writes the parse trees and associated features from the parse chart pointed to by `pParses_in'. How many parse trees are written, and how they are displayed, is controlled by `pPATR_in->iMaxAmbiguities' and `pPATR_in->eTreeDisplay'. The bits in `pPATR_in->iFeatureDisplay' control which features are written, and how they are displayed in the output file. The arguments to `writePATRParses' are as follows: `pParses_in' points to a parse chart produced by `parseWithPATR'. `pOutputFP_in' is an output `FILE' pointer. `pPATR_in' points to a data structure that contains the PC-PATR language data and control variables. Return Value ------------ none Example ------- #include "patr.h" struct lex_item { char * pszWord; char * pszGloss; char * pszCat; unsigned int * puiFeatures; }; ... char ** ppszFeatureNames_g; PATRData sPATRData_g; TRIE * pLexicon_g; ... void parse(char * pszSentence_in, FILE * pOutputFP_in) { PATREdgeList * pResult = NULL; PATRWord * pSentence = NULL; PATRWord * pNewWord; PATRWord * pPrevWord = NULL; int bSaveUnification; int bSaveTopDownFilter; char * pszWord; struct lex_item * pLexItem; unsigned uiParseCount = 0; if ((pszSentence_in == NULL) || (pOutputFP_in == NULL)) return; fprintf(pOutputFP_in, "%s\n", pszSentence_in); /* * save pointers to temporary parse structures */ markPATRParseGarbage(&sPATRData_g); /* * convert the sentence string to what parseWithPATR() wants */ for ( pszWord = strtok(pszSentence_in, " \t\n") ; pszWord ; pszWord = strtok(NULL, " \t\n") ) { pLexItem = findDataInTRIE(pLexicon_g, pszWord); if (pLexItem == NULL) { reportError(ERROR_MSG, "Cannot find "\%s\" in the lexicon\n", pszWord); collectPATRParseGarbage(&sPATRData_g); return; } pNewWord = buildPATRWordForKimmo(pszWord, pLexItem->pszGloss, pLexItem->pszCat, pLexItem->puiFeatures, ppszFeatureNames_g, &sPATRData_g); if (pPrevWord == NULL) /* If first (no prev) */ pSentence = pNewWord; /* Set head to this */ else pPrevWord->pNext = pNewWord; /* Else link from prev */ pPrevWord = pNewWord; /* Set prev to this */ } if (pSentence != NULL) { /* * parse the word and save a permanent copy of the result */ int iStage; const char * psz = NULL; PATREdgeList * pel; pResult = parseWithPATR(pSentence, &iStage, &sPATRData_g); if (iStage != 0) fprintf(pOutputFP_in, "**** Cannot parse this sentence ****\n"); switch (iStage) { case 0: for ( pel = pResult ; pel ; pel = pel->pNext ) ++uiParseCount; break; case 1: psz = "**** Turning off unification ****\n"; break; case 2: psz = "**** Turning off top-down filtering ****\n"; break; case 3: psz = "**** Building the largest parse \"bush\" ****\n"; break; case 4: psz = "**** No output available ****\n"; break; case 5: psz = "**** Out of Memory (after %lu edges) ****\n"; break; case 6: psz = "**** Out of Time (after %lu edges) ****\n"; break; } if (psz) fprintf(pOutputFP_in, psz, pPATR_in->uiEdgesAdded); if (pResult) { writePATRParses(pResult, pOutputFP_in, pPATR_in); putc('\n', pOutputFP_in); } } else { fprintf(pOutputFP_in, "**** Nothing to parse ****\n"); } /* * Free any temporary parse structures */ collectPATRParseGarbage(&sPATRData_g); } void processFile(char * pszInput_in, char * pszOutput_in) { char * pszLine; FILE * pInputFP; FILE * pOutputFP; pInputFP = fopen(pszInput_in, "r"); if (pInputFP == NULL) return; pOutputFP = fopen(pszInput_in, "r"); if (pOutputFP == NULL) { fclose(pInputFP); return; } while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL) { parse(pszLine, pOutputFP); } fclose(pInputFP); fclose(pOutputFP); } Source File ----------- `userpatr.c' writePATRStyledOutput ===================== Syntax ------ #include "patr.h" void writePATRStyledOutput(PATREdgeList * pParses_in, char * pszWord_in, char * pszLex_in, char * pszGloss_in, FILE * pOutputFP_in, PATRFeatureTags * pFeatTags_in, char * pszParseStartTag_in, char * pszParseEndTag_in, PATRData * pPATR_in, unsigned * puiAmbigCount_io); Description ----------- `writePATRStyledOutput' writes the parse trees and associated features from the parse chart pointed to by `pParses_in' in a highly stylized fashion. (It was written for KTAGGER and may not be useful for any other purpose.) The arguments to `writePATRStyledOutput' are as follows: `pParses_in' points to a parse chart produced by `parseWithPATR'. Each parse tree is written as the value of the `' feature referenced by `pFeatTags_in', and its top level feature structure is written as the value of the `' feature referenced by `pFeatTags_in'. `pszWord_in' points to the word (or sentence) that was parsed by `parseWithPATR'. It is written as the value of the `' feature referenced by `pFeatTags_in'. `pszLex_in' points to a concatenated string of morphemes (words) found in the word (sentence). It is written as the value of the `' feature referenced by `pFeatTags_in'. `pszGloss_in' points to a concatenated string of glosses for the morphemes (words) found in the word (sentence). It is written as the value of the `' feature referenced by `pFeatTags_in'. `pOutputFP_in' is an output `FILE' pointer. `pFeatTags_in' points to a list of data structures containing feature paths with associated start and end tags. Feature paths that do not match one of the five special values (`', `', `', `', or `') are matched against the top level feature structure associated with the current parse. `pszParseStartTag_in' points to a string used to mark the beginning of a parse in the output file. `pszParseEndTag_in' points to a string to mark the end of a parse in the output file. `pPATR_in' points to a data structure that contains the PC-PATR language data and control variables. `puiAmbigCount_io' points to an unsigned integer that counts the number of parses pointed to by `pParses_in'. The number is added to by `writePATRStyledOutput'. Return Value ------------ none Example ------- #include #include #include "patr.h" #include "kimmo.h" #include "opaclib.h" ... KimmoData sKimmoData_g; PATRFeatureTags * pFeatureTags_g; ... void process(char * pszInput_in, char * pszOutput_in) { char * pszLine; char * pszWord; KimmoResult * pKimmoResults; KimmoResult * pResult; char * pszMorphGlosses = NULL; char * pszMorphLexes = NULL; unsigned uiAmbiguityCount; unsigned uiDotsCount = 0; FILE * pInputFP; FILE * pOutputFP; pInputFP = fopen(pszInput_in, "r"); if (pInputFP == NULL) return; pOutputFP = fopen(pszOutput_in, "w"); if (pOutputFP == NULL) { fclose(pInputFP); return; } while ((pszLine = readLineFromFile(pInputFP, NULL, 0)) != NULL) { pszWord = strspn(pszLine, " \t\r\n\f"); if (*pszWord == '\0') continue; trimTrailingWhitespace(pszWord); fprintf(pOutputFP, "\n"); pKimmoResults = applyKimmoRecognizer((unsigned char *)pszWord, &sKimmoData_g); for ( pResult = pKimmoResults, uiAmbiguityCount = 0 ; pResult ; pResult = pResult->pNext ) { pszMorphLexes = (char *)concatKimmoMorphLexemes( pResult->pAnalysis, "", &sKimmoData_g); pszMorphGlosses = (char *)concatKimmoMorphGlosses( pResult->pAnalysis, "", &sKimmoData_g); writePATRStyledOutput(pResult->pParseChart, pszWord, pszMorphLexes, pszMorphGlosses, pOutputFP, pFeatureTags_g, "", "", &sKimmoData_g.sPATR, &uiAmbiguityCount); fprintf(pOutputFP, "\n"); freeMemory(pszMorphLexes); freeMemory(pszMorphGlosses); } if (pKimmoResults == NULL) fprintf(pOutputFP, "*** %s ***\n", pszWord); else freeKimmoResult( pKimmoResults ); fprintf(pOutputFP, "\n"); } fclose(pInputFP); fclose(pOutputFP); } Source File ----------- `wrtstyle.c' Table of Contents ***************** Introduction to the PC-PATR function library Variable and function naming conventions Preprocessor macro names Data structure names Variable names Function names Examples PC-PATR data structures PATRData PATREdgeList PATRFeatureTags PATRLabeledFeature PATRWord The PC-PATR function library global variables bCancelPATROperation_g cPATRPatchSep_g iPATRPatchlevel_g iPATRRevision_g iPATRVersion_g pszPATRCompileDate_g pszPATRCompileFormat_g pszPATRCompileTime_g pszPATRDate_g pszPATRTestVersion_g pszPATRYear_g PC-PATR functions addPATRLexItem buildPATRWord buildPATRWordForKimmo collectPATRParseGarbage convertKimmoPATRToWordAnalyses freePATREdgeList freePATRFeature freePATRGrammar freePATRInternalMemory freePATRLexicon loadPATRGrammar loadPATRLexicon loadPATRLexiconFromAmple markPATRParseGarbage parseAmpleSentenceWithPATR parsePATRFeatureString parseWithAmpleForPATRLexicon parseWithKimmoForPATRLexicon parseWithPATR parseWithPATRLexicon showPATRLexicon storePATREdgeList storePATRFeature stringifyPATRParses writePATRLexicon writePATRParses writePATRStyledOutput