PC-PATR is an implementation for personal computers of the PATR-II computational linguistic formalism. The PATR-II formalism can be viewed as a computer language for encoding linguistic information. It does not presuppose any particular theory of syntax. It was originally developed by Stuart M. Shieber at Stanford University in the early 1980's. A PATR-II grammar consists of a set of rules and a lexicon. Each rule consists of a context-free phrase structure rule and a set of feature constraints, that is, unifications on the feature structures associated with the constituents of the phrase structure rules. The lexicon provides the items that can replace the terminal symbols of the phrase structure rules, that is, the words of the language together with their relevant features.
This function library contains the processing functions used by PC-PATR and related programs. It has been developed with the goal of making it easier to cast PATR-II style parsing into different frameworks. The first use of this library has been to add a morphotactic component to PC-Kimmo consisting of a PC-PATR word parser.
PC-PATR (and thus this function library) is still under development. The author would appreciate feedback directed to the following address:
Stephen McConnel (972)708-7361 (office) Language Software Development (972)708-7561 (fax) SIL International 7500 W. Camp Wisdom Road Dallas, TX 75236 steve@acadcomp.sil.org U.S.A. or Stephen_McConnel@Sil.org
The basic goal behind choosing names in the PC-PATR function library is for the name to convey information about what it represents. This is achieved in two ways: striving for a descriptive name rather than a short cryptic abbreviated name, and following a different pattern of capitalization for each type of name.
Preprocessor macro names are written entirely in capital letters. If
the name requires more than one word for an adequate description, the
words are joined together with intervening underscore (_
)
characters.
Data structure names consist of one or more capitalized words. If the name requires more than one word for an adequate description, the words are joined together without underscores, depending on the capitalization pattern to make them readable as separate words.
Variable names in the PC-PATR function library follow a modified form of the Hungarian naming convention described by Steve McConnell in his book Code Complete on pages 202-206.
Variable names have three parts: a lowercase type prefix, a descriptive name, and a scope suffix.
The type prefix has the following basic possibilities:
b
char
, short
, or int
c
char
but sometimes a short
or
int
d
double
e
enum
or as a char
,
short
, or int
i
int
, short
, long
, or
(rarely) char
s
struct
statement
sz
pf
In addition, the basic types may be prefixed by these qualifiers:
u
a
p
The descriptive name portion of a variable name consists of one or more
capitalized words concatenated together. There are no underscores
(_
) separating these words from each other, or from the type
prefix. For the PC-PATR function library, the descriptive
name for global variables
begins with PATR.
The scope suffix has these possibilities:
_g
_m
static
)
_in
_out
_io
_s
static
)
The lack of a scope suffix indicates that a variable is declared within a function and exists on the stack for the duration of the current call.
Global function names in the PC-PATR function library have
two parts: a verb that is all lowercase followed by a noun phrase
containing one or more capitalized words. These pieces are
concatanated without any intervening underscores (_
). For the
PC-PATR library functions, the noun phrase section
includes
PATR.
Given the discussion above, it is easy to discern at a glance what type of item each of the following names refers to.
SAMPLE_NAME
SampleName
pSampleName
writeSampleName
SampleName
).
The PC-PATR functions operate on a number of different data structures. The most important of these are described in the following sections. The PC-PATR functions also use a number of other data structures internally, but it should not be necessary for a programmer to manipulate them directly.
/* * forward declarations of internal PATR data types */ typedef struct patr_grammar PATRGrammar; typedef struct patr_lexicon PATRLexicon; typedef struct { char bFailure; char bUnification; char eTreeDisplay; char bGloss; char bGlossesExist; char iFeatureDisplay; char bCheckCycles; char bTopDownFilter; short iMaxAmbiguities; short iDebugLevel; char cComment; char bSilent; char bShowWarnings; char bPromoteDefAtoms; time_t iMaxProcTime; FILE * pLogFP; char * pszGrammarFile; PATRGrammar * pGrammar; char * pszRecordMarker; char * pszWordMarker; char * pszGlossMarker; char * pszCategoryMarker; char * pszFeatureMarker; PATRLexicon * pLexicon; int iCurrentIndex; int iParseCount; } PATRData;
The PATRData
data structure collects the information used for
data processing within the PC-PATR functions. Its general purpose is
to reduce the number of parameters needed by the various functions.
bFailure
TRUE
(nonzero).
bUnification
TRUE
(nonzero). If
FALSE
, the parser acts only as a context free chart parser,
which usually produces much more ambiguous output.
eTreeDisplay
PATR_NO_TREE
PATR_FLAT_TREE
(S (NP (N cows))(VP (VerbalP (V eat))(NP (N grass))))
PATR_FULL_TREE
S _____|_____ NP VP | ___|____ N VerbalP NP cows | | V N eat grass
PATR_INDENTED_TREE
S NP N cows VP VerbalP V eat NP N grass
bGloss
TRUE
.
bGlossesExist
iFeatureDisplay
iFeatureDisplay & PATR_FEATURE_ON
iFeatureDisplay & PATR_FEATURE_FLAT
S: [ cat: S head: [ agr: $1[ 3sg: - ] finite:+ pos: V vform: BASE ] subj: [ cat: NP head: [ agr: $1 case: NOM number:PL pos: N proper:- verbal:- ] ] ]On the other hand, if this bit is set, the same feature structure would be written like this:
S: [ cat:S head:[ agr:$1[ 3sg:- ] finite:+ pos:V vform:BASE ] subj:[ cat:NP head:[ agr:$1 case:NOM number:PL pos:N proper:- verbal:- ] ] ]
iFeatureDisplay & PATR_FEATURE_ALL
S_1: [ cat:S head:[ agr:$1[ 3sg:- ] finite:+ pos:V vform:BASE ] subj:[ cat:NP head:[ agr:$1 case:NOM number:PL pos:N proper:- verbal:- ] ] ] NP_2: [ cat:NP head:[ agr:[ 3sg:- ] number:PL pos:N proper:- verbal:- ] ] N_3: [ cat:N gloss:`cow head:[ agr:[ 3sg:- ] number:PL pos:N proper:- verbal:- ] lex:cows root_pos:N ] VP_4: [ cat:VP head:[ finite:+ pos:V vform:BASE ] ] VerbalP_5: [ cat:VerbalP head:[ finite:+ pos:V vform:BASE ] ] V_6: [ cat:V gloss:`eat head:[ pos:V vform:BASE ] lex:eat root_pos:V ] NP_7: [ cat:NP head:[ agr:[ 3sg:+ ] number:SG pos:N proper:- verbal:- ] ] N_8: [ cat:N gloss:`grass head:[ agr:[ 3sg:+ ] number:SG pos:N proper:- verbal:- ] lex:grass root_pos:N ]
iFeatureDisplay & PATR_FEATURE_TRIM
VP_3: [ cat: VP head: [ form: finite trans: [ pred: sleep arg1: $1[] arg2: [] ] ] syncat: [ first: [ cat: NP head: [ agreement: [ person:third number:singular ] trans: $1 ] ] rest: end ] ]If this bit is set, the same data structure would look like this:
VP_3: [ cat: VP head: [ form: finite trans: [ pred: sleep ] ] syncat: [ first: [ cat: NP head: [ agreement: [ person:third number:singular ] ] ] rest: end ] ]
bCheckCycles
bTopDownFilter
iMaxAmbiguities
iDebugLevel
cComment
PATR_DEFAULT_COMMENT
is a symbol for the default value.)
bSilent
stderr
).
bShowWarnings
bPromoteDefAtoms
iMaxProcTime
0
means no limit.
pLogFP
FILE
pointer for an output log file (NULL
means
none).
pszGrammarFile
NULL
means none).
pGrammar
NULL
means none).
pszRecordMarker
pszWordMarker
pszGlossMarker
pszCategoryMarker
pszFeatureMarker
pLexicon
NULL
means none).
iCurrentIndex
iParseCount
`patr.h'
/* * forward declaration of an internal PATR data type */ typedef struct patr_edge PATREdge; typedef struct patr_edge_list { PATREdge * pEdge; struct patr_edge_list * pNext; } PATREdgeList;
The PATREdgeList
data structure encodes a list of parse results
returned by the PC-PATR parsing functions.
pEdge
pNext
`patr.h'
#include "strlist.h" typedef struct patr_feat_tags { StringList * pFeaturePath; char * pszStartTag; char * pszEndTag; struct patr_feat_tags * pNext; } PATRFeatureTags;
The PATRFeatureTags
data structure contains information needed
to write feature structures to an output file in a stylized fashion.
pFeaturePath
pszStartTag
pszEndTag
pNext
PATRFeatureTags
data structure. This
facilitates building a list of such items.
`patr.h'
/* * forward declaration of an internal PATR data type */ typedef struct patr_feature PATRFeature; typedef struct patr_labeled_feat { char * pszLabel; PATRFeature * pFeature; struct patr_labeled_feat * pNext; } PATRLabeledFeature;
The PATRLabeledFeature
data structure contains information
needed to abbreviate a feature structure to a simple label (template
name) while writing an output file.
pszLabel
pFeature
value
pFeature
pNext
PATRLabeledFeature
data structure. This
facilitates building a list of such items.
`patr.h'
/* * forward declaration of an internal PATR data type */ typedef struct patr_categ PATRWordCategory; typedef struct patr_word { int iWordNumber; char * pszWordName; PATRWordCategory * pCategories; struct patr_word * pNext; } PATRWord;
The PATRWord
data structure represents a single word of the
sentence fed to the PC-PATR parsing function. A sentence is
represented by a linked list of these data structures.
iWordNumber
pszWordName
pCategories
pNext
NULL
marks the end of
the sentence.
`patr.h'
This chapter gives the proper usage information about each of the global variables found in the PC-PATR function library. The `patr.h' header file contains the extern declarations for all of these variables.
#include "patr.h" extern int bCancelPATROperation_g;
bCancelPATROperation_g
can be set asynchronously to interrupt a
PC-PATR parse that seems to be stuck.
4.1.3 Example
#include <signal.h> #include "patr.h" ... void sigint_handler(int iSignal_in) { bCancelPATROperation_g = TRUE; signal(SIGINT, sigint_handler); } ... signal(SIGINT, sigint_handler); ...
`patrdata.c'
#include "patr.h" extern const char cPATRPatchSep_g;
cPATRPatchSep_g
is used to separate the revision and patch
level values when printing the PC-PATR version number. 'a'
indicates an alpha release, 'b'
indicates a beta release, and
'.'
indicates a production release.
4.2.3 Example See section 4.5 iPATRVersion_g.
4.2.4 Source File `patrdata.c'
#include "patr.h" extern const int iPATRPatchlevel_g;
iPATRPatchlevel_g
is the current patch level of the
PC-PATR function library and program. This is the third level version
number, reflecting bug fixes or internal improvements that should be
functionally invisible to users.
4.3.3 Example See section 4.5 iPATRVersion_g.
4.3.4 Source File `patrdata.c'
#include "patr.h" extern const int iPATRRevision_g;
iPATRRevision_g
is the current revision level of the
PC-PATR function library and program. This is the second level version
number, reflecting changes to program behavior that require changes to
the PC-PATR Reference Manual.
4.4.3 Example See section 4.5 iPATRVersion_g.
4.4.4 Source File `patrdata.c'
#include "patr.h" extern const int iPATRVersion_g;
iPATRVersion_g
is the current version number of the
PC-PATR function library and program. This is the top level version
number, reflecting a major rewrite of the program or major changes that
make it incompatible with earlier versions of the program.
4.5.3 Example
#include <stdio.h> #include "patr.h" ... fprintf(stderr, "PC-PATR version %d.%d%c%d (%s), Copyright %s SIL\n", iPATRVersion_g, iPATRRevision_g, cPATRPatchSep_g, iPATRPatchlevel_g, pszPATRDate_g, pszPATRYear_g); #ifdef __DATE__ fprintf(stderr, pszPATRCompileFormat_g, pszPATRCompileDate_g, pszPATRCompileTime_g); #else if (pszPATRTestVersion_g != NULL) fputs(pszPATRTestVersion_g, stderr); #endif ...
`patrdata.c'
#include "patr.h" #ifdef __DATE__ extern const char * pszPATRCompileDate_g; #endif
pszPATRCompileDate_g
points to a string containing the date on
which the PC-PATR library was compiled. It exists only if the C
compiler preprocessor supports the __DATE__
constant.
4.6.3 Example See section 4.5 iPATRVersion_g.
4.6.4 Source File `patrdata.c'
#include "patr.h" #ifdef __DATE__ #ifdef __TIME__ extern const char * pszPATRCompileFormat_g; #endif #endif
pszPATRCompileFormat_g
points to a printf
style format
string suitable for displaying pszPATRCompileDate_g
and
pszPATRCompileTime_g
. It exists only if the C compiler
preprocessor supports the __DATE__
and __TIME__
constants.
4.7.3 Example See section 4.5 iPATRVersion_g.
4.7.4 Source File `patrdata.c'
#include "patr.h" #ifdef __TIME__ extern const char * pszPATRCompileTime_g; #endif
pszPATRCompileTime_g
points to a string containing the time at
which the PC-PATR library was compiled. It exists only if the C
compiler preprocessor supports the __TIME__
constant.
4.8.3 Example See section 4.5 iPATRVersion_g.
4.8.4 Source File `patrdata.c'
#include "patr.h" extern const char * pszPATRDate_g;
pszPATRDate_g
points to a string containing the date on
which the PC-PATR library was last modified.
4.9.3 Example See section 4.5 iPATRVersion_g.
4.9.4 Source File `patrdata.c'
#include "patr.h" #ifndef __DATE__ extern const char * pszPATRTestVersion_g; #endif
pszPATRTestVersion_g
points to a string describing the test
status of PC-PATR (either alpha or beta). If this is a production
release version, it is set to NULL
. It is defined only if the C
compiler preprocessor does not support the __DATE__
constant.
4.10.3 Example See section 4.5 iPATRVersion_g.
4.10.4 Source File `patrdata.c'
#include "patr.h" extern const char * pszPATRYear_g;
pszPATRYear_g
points to a string containing the year in
which the PC-PATR library was last modified. This is suitable for a
copyright notice assigning the copyright to SIL International.
4.11.3 Example See section 4.5 iPATRVersion_g.
4.11.4 Source File `patrdata.c'
This document gives the proper usage information about each of the functions found in the PC-PATR function library. The prototypes and type definitions relevent to the use of these functions are all found in the `patr.h' header file.
#include "patr.h" void addPATRLexItem(char * pszWord_in, char * pszGloss_in, char * pszCategory_in, char * pszFeatures_in, PATRFeature * pFeature_in, PATRData * pPATR_io);
addPATRLexItem
adds one entry to the PC-PATR lexicon stored in
memory.
The arguments to addPATRLexItem
are as follows:
pszWord_in
pszGloss_in
pszCategory_in
pszFeatures_in
pFeature_in
pszFeatures_in
.
pPATR_io
none
5.1.4 Example
#include <string.h> #include "patr.h" #include "opaclib.h" void storeTemplate(WordTemplate * pTemplate_in, PATRData * pPATR_in) { WordAnalysis * pAnal; char * pszFeatures; char * p; for ( pAnal = pTemplate_in->pAnalyses ; pAnal ; pAnal = pAnal->pNext ) { pszFeatures = NULL; if (pAnal->pszFeatures != NULL) { pszFeatures = duplicateString(pAnal->pszFeatures); while ((p = strchr(pszFeatures, '=')) != NULL) *p = ' '; } addPATRLexItem(pTemplate_in->pszSurfaceForm, pAnal->pszAnalysis, pAnal->pszCategory, pszFeatures, NULL, pPATR_in); if (pszFeatures != NULL) { freeMemory(pszFeatures); pszFeatures = NULL; } } }
`patrlexi.c'
#include "patr.h" PATRWord * buildPATRWord(char * pszLex_in, char * pszGloss_in, char * pszCat_in, char * pszFeatures_in, PATRFeature * pPATRFeature_in, PATRData * pPATR_in);
buildPATRWord
converts the given information into the form needed
for a PATR parse. This is used by the (X)AMPLE program in preparing a
proposed word analysis for parsing with a word grammar.
The arguments to buildPATRWord
are as follows:
pszLex_in
NULL
.
pszGloss_in
NULL
.
pszCat_in
NULL
.
pszFeatures_in
=
). It may be NULL
.
pPATRFeature_in
NULL
.
pPATR_in
NULL
.
a pointer to a dynamically allocation PATRWord data structure containing the morpheme information
5.2.4 Example
#include "ample.h" /* #includes "patr.h" */ #include "ampledef.h" ... PATREdgeList * perform_word_parse(pAnal_in, pAmple_in) AmpleHeadList * pAnal_in; AmpleData * pAmple_in; { AmpleHeadList * hp; AmpleAllomorph * ap; AmpleMorpheme * mp; PATRWord * pWord = NULL; PATRWord * pNewMorph = NULL; char * pszLex; char * pszGloss; char * pszPATRCategory; char * pszFromCat; char * pszToCat; char * pszProps; char * pszFeatures; /* * Convert the list of morphemes to what parseWithPATR() wants. */ for ( hp = pAnal_in ; hp ; hp = hp->pLeft ) { ap = hp->pAllomorph; if (ap == NULL) continue; /* should never happen */ mp = ap->pMorpheme; if (mp == NULL) continue; /* should never happen */ if (mp->pszUnderForm != NULL) pszLex = mp->pszUnderForm; else pszLex = ap->pszAllomorph; if ((pszLex == NULL) || (*pszLex == NUL)) pszLex = "0"; pszGloss = mp->pszMorphName; if (mp->pszPATRCat != NULL) { pszPATRCategory = mp->pszPATRCat; } else { switch (hp->eType) { case AMPLE_PFX: pszPATRCategory = "Prefix"; break; case AMPLE_IFX: if ( (hp->pRight == NULL) || (hp->pRight->eType == AMPLE_SFX) ) pszPATRCategory = "Suffix"; else pszPATRCategory = "Prefix"; break; case AMPLE_SFX: pszPATRCategory = "Suffix"; break; default: pszPATRCategory = "Root"; break; } } if (hp->eType == AMPLE_ROOT) pszFromCat = NULL; else pszFromCat = findAmpleCategoryName(get_from(hp), pAmple_in->pCategories); pszToCat = findAmpleCategoryName(get_to(hp), pAmple_in->pCategories); pszProps = build_prop_string(ap->sPropertySet, &pAmple_in->sProperties); pszFeatures = build_feature_string(mp->pszMorphFd, pszFromCat, pszToCat, pszProps ? pszProps : ""); if (pszProps != NULL) freeMemory(pszProps); pNewMorph = buildPATRWord(pszLex, pszGloss, pszPATRCategory, pszFeatures, mp->pPATRFeature, &pAmple_in->sPATR); freeMemory(pszFeatures); pNewMorph->pNext = pWord; pWord = pNewMorph; } ... }
`patalloc.c'
#include "patr.h" PATRWord * buildPATRWordForKimmo(char * pszLex_in, char * pszGloss_in, char * pszCat_in, unsigned short * puiFeatIndexes_in, char ** ppszFeatures_in, PATRData * pPATR_in);
buildPATRWordForKimmo
converts the supplied
information into the form needed to apply a PC-PATR analysis.
The arguments to buildPATRWordForKimmo
are as follows:
pszLex_in
pszGloss_in
pszCat_in
puiFeatIndexes_in
ppszFeatures_in
puiFeatIndexes_in
.
pPATR_in
a pointer to a dynamically allocated PATRWord
data structure
encoding the supplied information
5.3.4 Example See section 5.19 parseWithPATR.
5.3.5 Source File `patrkimm.c'
#include "patr.h" void collectPATRParseGarbage(PATRData * pPATR_io);
collectPATRParseGarbage
cleans up the memory used by
parseWithPATR
. If the parse results are wanted for an extended
period of time, then storePATREdgeList
must be called after
parseWithPATR
and before collectPATRParseGarbage
collectPATRParseGarbage
has only one argument:
pPATR_io
none
5.4.4 Example See section 5.19 parseWithPATR.
5.4.5 Source File `patalloc.c'
#include "patr.h" #include "kimmo.h" WordAnalysis * convertKimmoPATRToWordAnalyses( KimmoResult * pKimmoResult_in, KimmoData * pKimmo_in, StringList * pCategoryPath_in, int cDecomp_in, PATRLabeledFeature * pFdDefinitions_in, WordAnalysis * pAnalyses_io, unsigned * puiAmbigCount_io, PATRData * pPATR_io);
convertKimmoPATRToWordAnalyses
converts the result of a PC-Kimmo analysis into a form suitable for
output via the writeTemplate
function. This is part of the PATR
function library rather than the Kimmo library because it requires
fiddling with feature structures internal to the PATR library.
The arguments to convertKimmoPATRToWordAnalyses
are as follows:
pKimmoResult_in
applyKimmoRecognizer
.
pKimmo_in
applyKimmoRecognizer
.
pCategoryPath_in
cDecomp_in
pFdDefinitions_in
pAnalyses_io
puiAmbigCount_io
a pointer to a list of word analysis data structures
5.5.4 Example
#include <stdio.h> #include "patr.h" #include "kimmo.h" ... PATRLabeledFeature * pFdDefinitions_g = NULL; ... static void analyzeFile(FILE * pInputFP_in, FILE * pOutputFP_in, char * pszOutputFile_in, TextControl * pTextControl_in, KimmoData * pKimmo_in) { WordTemplate * pWord; WordAnalysis * pAnal; KimmoResult * pResult; unsigned uiAmbiguityCount; unsigned char * pszWord; size_t i; while ((pWord = readTemplateFromText(pInputFP_in, pTextControl_in)) != NULL) { pWord->iOutputFlags = WANT_DECOMPOSITION | WANT_ORIGINAL; if (pWord->paWord != NULL) { uiAmbiguityCount = 0; for ( i = 0 ; pWord->paWord[i] ; ++i ) { pszWord = (unsigned char *)pWord->paWord[i]; pResult = applyKimmoRecognizer(pszWord, pKimmo_in); if (pResult != NULL) { pWord->pAnalyses = convertKimmoPATRToWordAnalyses( pResult, pKimmo_in, pCatPath_g, pTextControl_in->cDecomp, pFdDefinitions_g, pWord->pAnalyses, &uiAmbiguityCount); freeKimmoResult( pResult ); /* * adjust output for available fields */ pWord->iOutputFlags &= ~WANT_FEATURES; pWord->iOutputFlags &= ~WANT_CATEGORY; for ( pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext ) { if ( (pAnal->pszFeatures != NULL) && (*pAnal->pszFeatures != NUL) ) pWord->iOutputFlags |= WANT_FEATURES; if ( (pAnal->pszCategory != NULL) && (*pAnal->pszCategory != NUL) ) pWord->iOutputFlags |= WANT_CATEGORY; } } } } writeTemplate(pOutputFP_in, pszOutputFile_in, pWord, pTextControl_in ); freeWordTemplate( pWord ); } }
`cvtkp2wa.c'
#include "patr.h" void freePATREdgeList(PATREdgeList * pPATRResult_io, PATRData * pPATR_io);
freePATREdgeList
frees the memory allocated for a parse chart.
The arguments to freePATREdgeList
are as follows:
pPATRResult_io
storePATREdgeList
.
pPATR_io
none
5.6.4 Example See section 5.19 parseWithPATR.
5.6.5 Source File `patalloc.c'
#include "patr.h" void freePATRFeature(PATRFeature * pFeature_io, PATRData * pPATR_io);
freePATRFeature
frees the memory allocated for a PC-PATR feature
structure.
The arguments to freePATRFeature
are as follows:
pFeature_io
pPATR_io
none
5.7.4 Example
#include "patr.h" ... static void free_word_categs(pwc, pThis) PATRWordCategory * pwc; PATRData * pThis; { if (pwc) { freeMemory(pwc->pszCategory); freePATRFeature(pwc->pFeature, pThis); free_word_categs(pwc->pNext, pThis); freeMemory(pwc); } }
`patalloc.c'
#include "patr.h" void freePATRGrammar(PATRData * pPATR_io);
freePATRGrammar
frees the memory allocated for a PC-PATR grammar.
freePATRGrammar
has only one argument:
pPATR_io
none
5.8.4 Example See section 5.20 parseWithPATRLexicon.
5.8.5 Source File `grammar.c'
#include "patr.h" void freePATRInternalMemory(PATRData * pPATR_io);
freePATRInternalMemory
frees some memory used internally by
various PC-PATR library functions. It should be called only if the
grammar and lexicon have already been freed.
freePATRInternalMemory
has only one argument:
pPATR_io
none
5.9.4 Example See section 5.20 parseWithPATRLexicon.
5.9.5 Source File `patrfunc.c'
#include "patr.h" void freePATRLexicon(PATRData * pPATR_io);
freePATRLexicon
frees the memory allocated for storing the PC-PATR
lexicon.
freePATRLexicon
has only one argument:
pPATR_io
none
5.10.4 Example See section 5.20 parseWithPATRLexicon.
5.10.5 Source File `patrlexi.c'
#include "patr.h" int loadPATRGrammar(const char * pszGrammarFile_in, PATRData * pPATR_io);
loadPATRGrammar
loads the PC-PATR grammar from a file into
memory. The entire grammar must fit into a single file.
The arguments to loadPATRGrammar
are as follows:
pszGrammarFile_in
pPATR_io
zero if an error occurs while loading the grammar, otherwise a non-zero value
5.11.4 Example See section 5.20 parseWithPATRLexicon.
5.11.5 Source File `grammar.c'
#include "patr.h" int loadPATRLexicon(const char * pszLexiconFile_in, PATRData * pPATR_io);
loadPATRLexicon
loads a PC-PATR lexicon file into memory. The
lexicon may be spread out across several files, with a separate call to
loadPATRLexicon
for each file.
The arguments to loadPATRLexicon
are as follows:
pszLexiconFile_in
pPATR_io
zero if an error occurs while loading the lexicon, otherwise a non-zero value
5.12.4 Example See section 5.20 parseWithPATRLexicon.
5.12.5 Source File `patrlexi.c'
#include "patr.h" #include "opaclib.h" int loadPATRLexiconFromAmple(const char * pszAnalysisFile_in, TextControl * pTextControl_in, PATRData * pPATR_io);
loadPATRLexiconFromAmple
loads an AMPLE style analysis file into
the PC-PATR lexicon in memory. Several such files may be loaded to
fill in the lexicon.
The arguments to loadPATRLexiconFromAmple
are as follows:
pszAnalysisFile_in
pTextControl_in
pPATR_io
zero if an error occurs, or a non-zero value if lexicon entries are successfully loaded from the analysis file
5.13.4 Example
#include "patr.h" #include "opaclib.h" PATRData sPATRData_g; TextControl sTextControl_g; ... void processUsingAmple(char * pszGrammar_in, char * pszAnalysis_in, char * pszInput_in, char * pszOutput_in) { FILE * pInputFP; FILE * pOutputFP; char * pszLine; int iSentenceCount; int iParseCount; if (loadPATRGrammar(pszGrammar_in, &sPATRData_g) == 0) return; if (loadPATRLexiconFromAmple(pszAnalysis_in, &sTextControl_g, &sPATRData_g) != 0) { pInputFP = fopen(pszInput_in, "r"); if (pInputFP != NULL) { pOutputFP = fopen(pszOutput_in, "w"); if (pOutputFP != NULL) { iSentenceCount = 0; while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL) { ++iSentenceCount; iParseCount = parseWithPATRLexicon(pszLine, pOutputFP, NULL, FALSE, &sPATRData_g); showAmbiguousProgress(iParseCount, iSentenceCount); } fclose(pOutputFP); } fclose(pInputFP); } freePATRLexicon(&sPATRData_g); } freePATRGrammar(&sPATRData_g); freePATRInternalMemory(&sPATRData_g); }
`patrampl.c'
#include "patr.h" void markPATRParseGarbage(PATRData * pPATR_io);
markPATRParseGarbage
sets a garbage collection marker. Since C
does not support automatic garbage collection, and the unification
algorithm can lose track of allocated feature structures, special work
must be done to keep memory from leaking away. This function must be
called before calling parseWithPATR
, and
collectPATRParseGarbage
must be called afterwards.
markPATRParseGarbage
has only one argument:
pPATR_io
none
5.14.4 Example See section 5.19 parseWithPATR.
5.14.5 Source File `patalloc.c'
#include "patr.h" int parseAmpleSentenceWithPATR(WordTemplate ** pWords_in, FILE * pOutputFP_in, char * pszOutputFile_in, int bWarnUnusedFd_in, int bVerbose_in, int bWriteAmpleParses_in, TextControl * pTextControl_in, PATRData * pPATR_in);
parseAmpleSentenceWithPATR
tries to parse a sentence loaded from
an AMPLE analysis file, possibly disambiguating the morphological
analyses as a side-effect. It requires that a PC-PATR grammar be
loaded, but does not use the PC-PATR lexicon.
The arguments to parseAmpleSentenceWithPATR
are as follows:
pWords_in
NULL
terminated array of pointers to word analyses.
pOutputFP_in
FILE
pointer.
pszOutputFile_in
bWarnUnusedFd_in
TRUE
.
bVerbose_in
stderr
) if
TRUE
.
bWriteAmpleParses_in
TRUE
.
pTextControl_in
pPATR_in
the number of successful parses of the sentence
5.15.4 Example
#include "patr.h" #include "opaclib.h" ... PATRData sPATRData_g; TextControl sTextControl_g; ... void disambiguate(char * pszAnalysis_in, char * pszOutput_in) { FILE * pInputFP; FILE * pOutputFP; WordTemplate ** pSentence; unsigned uiAmbiguityCount; unsigned uiSentenceCount; unsigned uiParseCount; if (sPATRData_g.pGrammar == NULL) return; pInputFP = fopen( pszAnalysis_in, "r"); if (pInputFP == NULL) return; pOutputFP = fopen( pszOutput_in, "w" ); if (pOutputFP == NULL) { fclose(pInputFP); return; } for ( uiSentenceCount = 0, uiParseCount = 0 ;; ++uiSentenceCount ) { pSentence = readSentenceOfTemplates(pInputFP, pszAnalysis_in, ".!?", &sTextControl_g, sPATRData_g.pLogFP); if (pSentence == NULL) break; uiAmbiguityCount = parseAmpleSentenceWithPATR( pSentence, pOutputFP, pszOutput_in, FALSE, FALSE, TRUE, &sTextControl_g, &sPATRData_g); if (uiAmbiguityCount != 0) ++uiParseCount; } fprintf(stderr, "File parsing statistics: %u sentences read, %u parsed\n", uiSentenceCount, uiParseCount); fclose(pInputFP); fclose(pOutputFP); }
`patrampl.c'
#include "patr.h" PATRFeature * parsePATRFeatureString(char * pszField_in, PATRData * pPATR_in);
parsePATRFeatureString
creates a PC-PATR feature structure from
its representation as a set of feature path expressions.
The arguments to parsePATRFeatureString
are as follows:
pszField_in
pPATR_in
pointer to a PC-PATR feature structure
5.16.4 Example
#include "patr.h" #include "opaclib.h" ... PATRData sPATRData_g; ... PATRLabeledFeature * extractPATRLabeledFeature(char * pszField_in) { char * p; PATRFeature * pFeature; PATRLabeledFeature * pNewFdDef; p = strpbrk(pszField_in, whiteSpace); if (p == NULL) return( NULL ); *p++ = NUL; pFeature = parsePATRFeatureString(p, &sPATRData_g); if (pFeature == NULL) return( NULL ); pNewFdDef = (PATRLabeledFeature *)allocMemory( sizeof(PATRLabeledFeature)); pNewFdDef->pszLabel = duplicateString(pszField_in); pNewFdDef->pFeature = pFeature; pNewFdDef->pNext = NULL; return( pNewFdDef ); }
`patrfunc.c'
#include "patr.h" #include "ample.h" PATRLexItem * parseWithAmpleForPATRLexicon(char * pszWord_in, AmpleData * pAmple_in, PATRData * pPATR_in);
parseWithAmpleForPATRLexicon
parses the word using the AMPLE
information already loaded into memory. This provides an alternative
to creating a word lexicon file if an AMPLE analysis (with morpheme
lexicon files) already exists. It is commonly used as part of a
morphological parsing function passed to parseWithPATRLexicon
.
The arguments to parseWithAmpleForPATRLexicon
are as follows:
pszWord_in
pAmple_in
pPATR_in
a pointer to the node in the internal lexicon containing the newly
parsed word, or NULL
if it does not parse.
5.17.4 Example
#include "patr.h" #include "kimmo.h" #include "ample.h" ... PATRData sPATRData_g; KimmoData sKimmoData_g; AmpleData sAmpleData_g; ... static PATRLexItem * tryMorphParse(pszWord_in) char * pszWord_in; { if (sKimmoData_g.sPATR.pGrammar != NULL) { sKimmoData_g.pLogFP = sPATRData_g.pLogFP; return parseWithKimmoForPATRLexicon( pszWord_in, &sKimmoData_g, &sPATRData_g ); } else if (sAmpleData_g.pDictionary != NULL) { sAmpleData_g.pLogFP = sPATRData_g.pLogFP; return parseWithAmpleForPATRLexicon( pszWord_in, &sAmpleData_g, &sPATRData_g ); } else return NULL; } void parseFile(char * pszInput_in, char * pszOutput_in, char * pszLexicon_in) { FILE * pInputFP; FILE * pOutputFP; unsigned uiLine; if (sPATRData_g.pGrammar == NULL) return; if ( (sPATRData_g.pLexicon == NULL) && (sKimmoData_g.sPATR.pGrammar == NULL) && (sAmpleData_g.pDictionary == NULL) ) return; pInputFP = fopen( pszInput_in, "r"); if (pInputFP == NULL) return; pOutputFP = fopen( pszOutput_in, "w" ); if (pOutputFP == FULL) { fclose(pInputFP); return; } for ( uiLine = 1, uiSentences = 0, uiParsed = 0 ;; ++uiSentences ) { pszLine = readLineFromFile(pInputFP, &uiLine, sPATRData_g.cComment); if (pszLine == NULL) break; pszLine += strspn(pszLine, " \t\r\n"); if (*pszLine == '\0') continue; trimTrailingWhitespace(pszLine); fprintf(pOutputFP, "%s\n", pszLine); uiAmbiguityCount = parseWithPATRLexicon(pszLine, pOutputFP, tryMorphParse, TRUE, &sPATRData_g); if (uiAmbiguityCount != 0) ++uiParsed; } fprintf(stderr, "File parsing statistics: %u sentences read, %u parsed\n", uiSentences, uiParsed); fclose(pInputFP); fclose(pOutputFP); /* * save the lexicon entries generated by the morphological parsers */ if (pszLexicon_in != NULL) { pOutputFP = fopen(pszLexicon_in, "w"); if (pOutputFP != NULL) { writePATRLexicon(pOutputFP, &sPATRData_g); fclose(pOutputFP); } } }
`patrampl.c'
#include "patr.h" #include "kimmo.h" PATRLexItem * parseWithKimmoForPATRLexicon(char * pszWord_in, KimmoData * pKimmo_in, PATRData * pPATR_in));
parseWithKimmoForPATRLexicon
parses the word using the PC-Kimmo
information already loaded into memory. This provides an alternative
to creating a word lexicon file if an PC-Kimmo analysis (with morpheme
lexicon files) already exists. It is commonly used as part of a
morphological parsing function passed to parseWithPATRLexicon
.
The arguments to parseWithKimmoForPATRLexicon
are as follows:
pszWord_in
pKimmo_in
pPATR_in
a pointer to the node in the internal lexicon containing the newly
parsed word, or NULL
if it does not parse.
5.18.4 Example See section 5.17 parseWithAmpleForPATRLexicon.
5.18.5 Source File `parsepwk.c'
#include "patr.h" PATREdgeList * parseWithPATR(PATRWord * pSentence_in, int * piStage_out, PATRData * pPATR_io);
parseWithPATR
is the primary parsing routine in the PC-PATR function library.
It is a chart parser with these properties:
The arguments to parseWithPATR
are as follows:
pSentence_in
PATRWord
data structures
representing a sentence.
piStage_out
NULL
, the integer it points to
is set to one of these values:
pPATR_io
a pointer to the parse chart constructed, or NULL
if the parse fails
5.19.4 Example
#include "patr.h" struct lex_item { char * pszWord; char * pszGloss; char * pszCat; unsigned int * puiFeatures; }; ... char ** ppszFeatureNames_g; PATRData sPATRData_g; TRIE * pLexicon_g; ... PATREdgeList * parse(char * pszSentence_in) { PATREdgeList * pResult = NULL; PATRWord * pSentence = NULL; PATRWord * pNewWord; PATRWord * pPrevWord = NULL; int bSaveUnification; int bSaveTopDownFilter; char * pszWord; struct lex_item * pLexItem; if (pszSentence_in == NULL) return NULL; /* * save pointers to temporary parse structures */ markPATRParseGarbage(&sPATRData_g); /* * convert the sentence string to what parseWithPATR() wants */ for ( pszWord = strtok(pszSentence_in, " \t\n") ; pszWord ; pszWord = strtok(NULL, " \t\n") ) { pLexItem = findDataInTRIE(pLexicon_g, pszWord); if (pLexItem == NULL) { reportError(ERROR_MSG, "Cannot find "\%s\" in the lexicon\n", pszWord); collectPATRParseGarbage(&sPATRData_g); return NULL; } pNewWord = buildPATRWordForKimmo(pszWord, pLexItem->pszGloss, pLexItem->pszCat, pLexItem->puiFeatures, ppszFeatureNames_g, &sPATRData_g); if (pPrevWord == NULL) /* If first (no prev) */ pSentence = pNewWord; /* Set head to this */ else pPrevWord->pNext = pNewWord; /* Else link from prev */ pPrevWord = pNewWord; /* Set prev to this */ } if (pSentence != NULL) { /* * parse the word and save a permanent copy of the result */ int iStage; pResult = parseWithPATR(pSentence, &iStage, &sPATRData_g); if (pResult != NULL) { pResult = storePATREdgeList(pResult, &sPATRData_g); } } /* * Free any temporary parse structures */ collectPATRParseGarbage(&sPATRData_g); return( pResult ); } void processFile(char * pszFilename_in) { char * pszLine; FILE * pInputFP; PATREdgeList * pParse; pInputFP = fopen(pszFilename_in, "r"); if (pInputFP == NULL) return; while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL) { pParse = parse(pszLine); if (pParse != NULL) { ... freePATREdgeList(pParse, &sPATRData_g); } } fclose(pInputFP); }
`lcparse.c'
#include "patr.h" int parseWithPATRLexicon( char * pszSentence_in, FILE * pOutputFP_in, PATRLexItem * (* pfMorphParser_in)(char * pszWord_in), int bWarnUnusedFd_in, PATRData * pPATR_in);
parseWithPATRLexicon
The arguments to parseWithPATRLexicon
are as follows:
pszSentence_in
pOutputFP_in
FILE
pointer.
pfMorphParser_in
pfMorphParser_in
is NULL
, then no morphological parsing
is done as a backup to the internal PC-PATR lexicon.
bWarnUnusedFd_in
TRUE
.
pPATR_in
the number of valid parses found for the sentence
5.20.4 Example See also section 5.17 parseWithAmpleForPATRLexicon.
#include "patr.h" #include "opaclib.h" PATRData sPATRData_g; ... void process(char * pszGrammar_in, char * pszLexicon_in, char * pszInput_in, char * pszOutput_in) { FILE * pInputFP; FILE * pOutputFP; char * pszLine; int iSentenceCount; int iParseCount; if (loadPATRGrammar(pszGrammar_in, &sPATRData_g) == 0) return; if (loadPATRLexicon(pszLexicon_in, &sPATRData_g) != 0) { pInputFP = fopen(pszInput_in, "r"); if (pInputFP != NULL) { pOutputFP = fopen(pszOutput_in, "w"); if (pOutputFP != NULL) { iSentenceCount = 0; while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL) { ++iSentenceCount; iParseCount = parseWithPATRLexicon(pszLine, pOutputFP, NULL, FALSE, &sPATRData_g); showAmbiguousProgress(iParseCount, iSentenceCount); } fclose(pOutputFP); } fclose(pInputFP); } freePATRLexicon(&sPATRData_g); } freePATRGrammar(&sPATRData_g); freePATRInternalMemory(&sPATRData_g); }
`patrlexi.c'
#include "patr.h" void showPATRLexicon(PATRData * pPATR_in);
showPATRLexicon
writes the internal PC-PATR lexicon to the
standard output stream (stdout
). This is useful only for
debugging purposes, if then.
showPATRLexicon
has only one argument:
pPATR_in
none
5.21.4 Example
#include "patr.h" PATRData sPATRData_g; ... void test_lexicon(char * pszLexicon_in) { if (loadPATRLexicon(pszLexicon_in, &sPATRData_g) != 0) { showPATRLexicon(&sPATRData_g); } }
`patrlexi.c'
#include "patr.h" PATREdgeList * storePATREdgeList(PATREdgeList * pPATRResult_in, PATRData * pPATR_io);
storePATREdgeList
makes a permanent (unaffected by garbage
collection) copy of a parse chart. It should be called after
parseWithPATR
and before collectPATRParseGarbage
.
Note that freePATREdgeList
is used to free the memory allocated by
storePATREdgeList
.
The arguments to storePATREdgeList
are as follows:
pPATRResult_in
parseWithPATR
.
pPATR_io
a pointer to a newly allocated copy of the parse chart (PATREdgeList structure)
5.22.4 Example See section 5.19 parseWithPATR.
5.22.5 Source File `patalloc.c'
#include "patr.h" PATRFeature * storePATRFeature(PATRFeature * pFeature_in, PATRData * pPATR_in);
storePATRFeature
makes a permanent (unaffected by garbage
collection) copy of a feature structure. Note that
freePATRFeature
is used to free the memory allocated by
storePATRFeature
.
The arguments to storePATRFeature
are as follows:
pFeature_in
pPATR_io
a pointer to a newly allocated copy of the feature structure
5.23.4 Example
/*FIX ME -- THIS NEEDS TO BE WRITTEN!*/
`patalloc.c'
#include "patr.h" int stringifyPATRParses(PATREdgeList * pParses_in, PATRData * pPATR_in, const char * pszSentence_in, char ** ppszBuffer_out);
stringifyPATRParses
creates a character string representation of
a parse chart. The output string contains both the parse trees and the
set of features indicated by the settings in the data structure pointed
to by pPATR_in
.
The arguments to stringifyPATRParses
are as follows:
pParses_in
parseWithPATR
.
pPATR_in
pszSentence_in
NULL
.
ppszBuffer_out
NULL
or the address
of dynamically allocated memory containing the character string
representation of the parse chart.
-1
if an error occurs, or 0
if successful
5.24.4 Example
#include "patr.h" struct lex_item { char * pszWord; char * pszGloss; char * pszCat; unsigned int * puiFeatures; }; ... char ** ppszFeatureNames_g; PATRData sPATRData_g; TRIE * pLexicon_g; ... char * parse(char * pszSentence_in) { PATREdgeList * pResult = NULL; PATRWord * pSentence = NULL; PATRWord * pNewWord; PATRWord * pPrevWord = NULL; int bSaveUnification; int bSaveTopDownFilter; char * pszWord; struct lex_item * pLexItem; char * pszResult = NULL; if (pszSentence_in == NULL) return NULL; /* * save pointers to temporary parse structures */ markPATRParseGarbage(&sPATRData_g); /* * convert the sentence string to what parseWithPATR() wants */ for ( pszWord = strtok(pszSentence_in, " \t\n") ; pszWord ; pszWord = strtok(NULL, " \t\n") ) { pLexItem = findDataInTRIE(pLexicon_g, pszWord); if (pLexItem == NULL) { reportError(ERROR_MSG, "Cannot find "\%s\" in the lexicon\n", pszWord); collectPATRParseGarbage(&sPATRData_g); return NULL; } pNewWord = buildPATRWordForKimmo(pszWord, pLexItem->pszGloss, pLexItem->pszCat, pLexItem->puiFeatures, ppszFeatureNames_g, &sPATRData_g); if (pPrevWord == NULL) /* If first (no prev) */ pSentence = pNewWord; /* Set head to this */ else pPrevWord->pNext = pNewWord; /* Else link from prev */ pPrevWord = pNewWord; /* Set prev to this */ } if (pSentence != NULL) { /* * parse the word and save a permanent copy of the result */ int iStage; pResult = parseWithPATR(pSentence, &iStage, &sPATRData_g); if (pResult != NULL) { stringifyPATRParses(pResult, &sPATRData_g, NULL, pszResult); } } /* * Free any temporary parse structures */ collectPATRParseGarbage(&sPATRData_g); return pszResult; } void processFile(char * pszFilename_in) { char * pszLine; FILE * pInputFP; char * pszParse; pInputFP = fopen(pszFilename_in, "r"); if (pInputFP == NULL) return; while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL) { pszParse = parse(pszLine); if (pszParse != NULL) { ... freeMemory(pszParse); } } fclose(pInputFP); }
`patrstrg.c'
#include "patr.h" void writePATRLexicon(FILE * pOutputFP_in, PATRData * pPATR_in);
writePATRLexicon
writes the internal PC-PATR lexicon to a file
in a form suitable for reloading with loadPATRLexicon
. This is
most useful when a morphological parser is used to populate the
lexicon.
See section 5.17 parseWithAmpleForPATRLexicon,
section 5.18 parseWithKimmoForPATRLexicon, and section 5.20 parseWithPATRLexicon.
The arguments to writePATRLexicon
are as follows:
pOutputFP_in
FILE
pointer.
pPATR_in
none
5.25.4 Example See section 5.17 parseWithAmpleForPATRLexicon.
5.25.5 Source File `patrlexi.c'
#include "patr.h" void writePATRParses(PATREdgeList * pParses_in, FILE * pOutputFP_in, PATRData * pPATR_in);
writePATRParses
writes the parse trees and associated features
from the parse chart pointed to by pParses_in
. How many parse
trees are written, and how they are displayed, is controlled by
pPATR_in->iMaxAmbiguities
and pPATR_in->eTreeDisplay
.
The bits in pPATR_in->iFeatureDisplay
control which features are
written, and how they are displayed in the output file.
The arguments to writePATRParses
are as follows:
pParses_in
parseWithPATR
.
pOutputFP_in
FILE
pointer.
pPATR_in
none
5.26.4 Example
#include "patr.h" struct lex_item { char * pszWord; char * pszGloss; char * pszCat; unsigned int * puiFeatures; }; ... char ** ppszFeatureNames_g; PATRData sPATRData_g; TRIE * pLexicon_g; ... void parse(char * pszSentence_in, FILE * pOutputFP_in) { PATREdgeList * pResult = NULL; PATRWord * pSentence = NULL; PATRWord * pNewWord; PATRWord * pPrevWord = NULL; int bSaveUnification; int bSaveTopDownFilter; char * pszWord; struct lex_item * pLexItem; unsigned uiParseCount = 0; if ((pszSentence_in == NULL) || (pOutputFP_in == NULL)) return; fprintf(pOutputFP_in, "%s\n", pszSentence_in); /* * save pointers to temporary parse structures */ markPATRParseGarbage(&sPATRData_g); /* * convert the sentence string to what parseWithPATR() wants */ for ( pszWord = strtok(pszSentence_in, " \t\n") ; pszWord ; pszWord = strtok(NULL, " \t\n") ) { pLexItem = findDataInTRIE(pLexicon_g, pszWord); if (pLexItem == NULL) { reportError(ERROR_MSG, "Cannot find "\%s\" in the lexicon\n", pszWord); collectPATRParseGarbage(&sPATRData_g); return; } pNewWord = buildPATRWordForKimmo(pszWord, pLexItem->pszGloss, pLexItem->pszCat, pLexItem->puiFeatures, ppszFeatureNames_g, &sPATRData_g); if (pPrevWord == NULL) /* If first (no prev) */ pSentence = pNewWord; /* Set head to this */ else pPrevWord->pNext = pNewWord; /* Else link from prev */ pPrevWord = pNewWord; /* Set prev to this */ } if (pSentence != NULL) { /* * parse the word and save a permanent copy of the result */ int iStage; const char * psz = NULL; PATREdgeList * pel; pResult = parseWithPATR(pSentence, &iStage, &sPATRData_g); if (iStage != 0) fprintf(pOutputFP_in, "**** Cannot parse this sentence ****\n"); switch (iStage) { case 0: for ( pel = pResult ; pel ; pel = pel->pNext ) ++uiParseCount; break; case 1: psz = "**** Turning off unification ****\n"; break; case 2: psz = "**** Turning off top-down filtering ****\n"; break; case 3: psz = "**** Building the largest parse \"bush\" ****\n"; break; case 4: psz = "**** No output available ****\n"; break; case 5: psz = "**** Out of Memory (after %lu edges) ****\n"; break; case 6: psz = "**** Out of Time (after %lu edges) ****\n"; break; } if (psz) fprintf(pOutputFP_in, psz, pPATR_in->uiEdgesAdded); if (pResult) { writePATRParses(pResult, pOutputFP_in, pPATR_in); putc('\n', pOutputFP_in); } } else { fprintf(pOutputFP_in, "**** Nothing to parse ****\n"); } /* * Free any temporary parse structures */ collectPATRParseGarbage(&sPATRData_g); } void processFile(char * pszInput_in, char * pszOutput_in) { char * pszLine; FILE * pInputFP; FILE * pOutputFP; pInputFP = fopen(pszInput_in, "r"); if (pInputFP == NULL) return; pOutputFP = fopen(pszInput_in, "r"); if (pOutputFP == NULL) { fclose(pInputFP); return; } while ((pszLine = readLineFromFile(pInputFP, NULL, '\0')) != NULL) { parse(pszLine, pOutputFP); } fclose(pInputFP); fclose(pOutputFP); }
`userpatr.c'
#include "patr.h" void writePATRStyledOutput(PATREdgeList * pParses_in, char * pszWord_in, char * pszLex_in, char * pszGloss_in, FILE * pOutputFP_in, PATRFeatureTags * pFeatTags_in, char * pszParseStartTag_in, char * pszParseEndTag_in, PATRData * pPATR_in, unsigned * puiAmbigCount_io);
writePATRStyledOutput
writes the parse trees and associated
features from the parse chart pointed to by pParses_in
in a
highly stylized fashion. (It was written for KTAGGER and may not be
useful for any other purpose.)
The arguments to writePATRStyledOutput
are as follows:
pParses_in
parseWithPATR
.
Each parse tree is written as the value of
the <TREE>
feature referenced by pFeatTags_in
, and its
top level feature structure is written as the value of the
<FEAT>
feature referenced by pFeatTags_in
.
pszWord_in
parseWithPATR
. It is written as
the value of the <WORD>
feature referenced by pFeatTags_in
.
pszLex_in
<LEX>
feature
referenced by pFeatTags_in
.
pszGloss_in
<GLOSS>
feature referenced by pFeatTags_in
.
pOutputFP_in
FILE
pointer.
pFeatTags_in
<TREE>
, <FEAT>
, <WORD>
,
<LEX>
, or <GLOSS>
) are matched against the top level
feature structure associated with the current parse.
pszParseStartTag_in
pszParseEndTag_in
pPATR_in
puiAmbigCount_io
pParses_in
. The number is added to by
writePATRStyledOutput
.
none
5.27.4 Example
#include <stdio.h> #include <string.h> #include "patr.h" #include "kimmo.h" #include "opaclib.h" ... KimmoData sKimmoData_g; PATRFeatureTags * pFeatureTags_g; ... void process(char * pszInput_in, char * pszOutput_in) { char * pszLine; char * pszWord; KimmoResult * pKimmoResults; KimmoResult * pResult; char * pszMorphGlosses = NULL; char * pszMorphLexes = NULL; unsigned uiAmbiguityCount; unsigned uiDotsCount = 0; FILE * pInputFP; FILE * pOutputFP; pInputFP = fopen(pszInput_in, "r"); if (pInputFP == NULL) return; pOutputFP = fopen(pszOutput_in, "w"); if (pOutputFP == NULL) { fclose(pInputFP); return; } while ((pszLine = readLineFromFile(pInputFP, NULL, 0)) != NULL) { pszWord = strspn(pszLine, " \t\r\n\f"); if (*pszWord == '\0') continue; trimTrailingWhitespace(pszWord); fprintf(pOutputFP, "<word>\n"); pKimmoResults = applyKimmoRecognizer((unsigned char *)pszWord, &sKimmoData_g); for ( pResult = pKimmoResults, uiAmbiguityCount = 0 ; pResult ; pResult = pResult->pNext ) { pszMorphLexes = (char *)concatKimmoMorphLexemes( pResult->pAnalysis, "", &sKimmoData_g); pszMorphGlosses = (char *)concatKimmoMorphGlosses( pResult->pAnalysis, "", &sKimmoData_g); writePATRStyledOutput(pResult->pParseChart, pszWord, pszMorphLexes, pszMorphGlosses, pOutputFP, pFeatureTags_g, "<parse>", "</parse>", &sKimmoData_g.sPATR, &uiAmbiguityCount); fprintf(pOutputFP, "\n"); freeMemory(pszMorphLexes); freeMemory(pszMorphGlosses); } if (pKimmoResults == NULL) fprintf(pOutputFP, "<parse>*** %s ***</parse>\n", pszWord); else freeKimmoResult( pKimmoResults ); fprintf(pOutputFP, "</word>\n"); } fclose(pInputFP); fclose(pOutputFP); }
`wrtstyle.c'
This document was generated on 20 March 2003 using texi2html 1.56k.