PC-Kimmo Function Library Reference Manual two-level processor functions for morphological analysis version 2.1.0 October 1997 by Stephen McConnel Copyright (C) 2000 SIL International Published by: Language Software Development SIL International 7500 W. Camp Wisdom Road Dallas, TX 75236 U.S.A. Permission is granted to make and distribute verbatim copies of this file provided the copyright notice and this permission notice are preserved in all copies. The author may be reached at the address above or via email as `steve@acadcomp.sil.org'. Introduction to the PC-Kimmo function library ********************************************* One of the original design goals of the PC-Kimmo program was to produce a reusable function library that could be used in different programs or with different user interfaces. The functions and data structures described in this manual are a result of that design goal. The PC-Kimmo function library can be used for programs that need to handle the morphology and phonology of natural language using the two-level morphology originally invented by Kimmo Koskenniemi. (His use of the term morphology should be understood to encompass both what linguists would consider morphology proper-the decomposition of words into morphemes-and phonology-at least in the sense of morphophonemics.) The author would appreciate feedback directed to the following address: Stephen McConnel (972)708-7361 (office) Language Software Development (972)708-7561 (fax) SIL International 7500 W. Camp Wisdom Road Dallas, TX 75236 steve@acadcomp.sil.org U.S.A. or steve.mcconnel@sil.org Variable and function naming conventions **************************************** The basic goal behind choosing names in the PC-Kimmo function library is for the name to convey information about what it represents. This is achieved in two ways: striving for a descriptive name rather than a short cryptic abbreviated name, and following a different pattern of capitalization for each type of name. Preprocessor macro names ======================== Preprocessor macro names are written entirely in capital letters. If the name requires more than one word for an adequate description, the words are joined together with intervening underscore (`_') characters. Data structure names ==================== Data structure names consist of one or more capitalized words. If the name requires more than one word for an adequate description, the words are joined together without underscores, depending on the capitalization pattern to make them readable as separate words. Variable names ============== Variable names in the PC-Kimmo function library follow a modified form of the Hungarian naming convention described by Steve McConnell in his book `Code Complete' on pages 202-206. Variable names have three parts: a lowercase type prefix, a descriptive name, and a scope suffix. Type prefix ----------- The type prefix has the following basic possibilities: `b' a Boolean, usually encoded as a `char', `short', or `int' `c' a character, usually a `char' but sometimes a `short' or `int' `d' a double precision floating point number, that is, a `double' `e' an enumeration, encoded as an `enum' or as a `char', `short', or `int' `i' an integer, that is, an `int', `short', `long', or (rarely) `char' `s' a data structure defined by a `struct' statement `sz' a NUL (that is, zero) terminated character string `pf' a pointer to a function In addition, the basic types may be prefixed by these qualifiers: `u' indicates that an integer or a character is unsigned `a' indicates an array of the basic type `p' indicates a pointer to the type, possibly a pointer to an array or to a pointer Descriptive name ---------------- The descriptive name portion of a variable name consists of one or more capitalized words concatenated together. There are no underscores (`_') separating these words from each other, or from the type prefix. For the PC-Kimmo function library, the descriptive name for global variables begins with Kimmo. Scope suffix ------------ The scope suffix has these possibilities: `_g' indicates a global variable accessible throughout the program `_m' indicates a module (semiglobal) variable accessible throughout the file (declared `static') `_in' indicates a function argument used for input `_out' indicates a function argument used for output (must be a pointer) `_io' indicates a function argument used for both input and output (must be a pointer) `_s' indicates a function variable that retains its value between calls (declared `static') The lack of a scope suffix indicates that a variable is declared within a function and exists on the stack for the duration of the current call. Function names ============== Global function names in the PC-Kimmo function library have two parts: a verb that is all lowercase followed by a noun phrase containing one or more capitalized words. These pieces are concatanated without any intervening underscores (`_'). For the PC-Kimmo library functions, the noun phrase section includes Kimmo. Examples ======== Given the discussion above, it is easy to discern at a glance what type of item each of the following names refers to. `SAMPLE_NAME' is a preprocessor macro. `SampleName' is a data structure. `pSampleName' is a local pointer variable. `writeSampleName' is a function (that may apply to a data structure named `SampleName'). PC-Kimmo data structures ************************ The PC-Kimmo functions operate on a number of different data structures. The most important of these are described in the following sections. The PC-Kimmo functions also use a number of other data structures internally, but it should not be necessary for a programmer to manipulate them directly. KimmoData ========= Definition ---------- #include /* * type definition for KimmoData * needed for patr.h */ typedef struct kimmo_data KimmoData; #include "patr.h" /* needed for PATRData */ /* * forward declarations for internal data types */ typedef struct kimmo_alternation KimmoAlternation; typedef struct kimmo_lexicon KimmoLexicon; typedef struct kimmo_pair KimmoPair; typedef struct kimmo_rule KimmoRule; typedef struct kimmo_subset KimmoSubset; struct kimmo_data { /* * parameters for controlling the PC-Kimmo processing */ char bLimit; char iTraceLevel; char bUsePATR; char bSilent; char bShowWarnings; char bAlignment; unsigned char cGlossBegin; unsigned char cGlossEnd; unsigned char cComment; FILE * pLogFP; /* * loaded or derived from the rules file */ unsigned char ** ppszAlphabet; unsigned short uiAlphabetSize; unsigned char cNull; unsigned char cAny; unsigned char cBoundary; char bTwoLCFile; KimmoSubset * pSubsets; unsigned short uiSubsetCount; KimmoRule * pAutomata; unsigned short uiAutomataSize; KimmoPair * pFeasiblePairs; unsigned short uiFeasiblePairsCount; char * pszRulesFile; /* * loaded or derived from the lexicon file */ KimmoAlternation * pAlternations; unsigned short uiAlternationCount; KimmoLexicon * pLexiconSections; KimmoLexicon * pInitialLexicon; unsigned short uiLexiconSectionCount; unsigned char ** ppszFeatures; unsigned short uiFeatureCount; char * pszLexiconFile; /* * loaded or derived from the grammar file */ PATRData sPATR; }; Description ----------- The `KimmoData' data structure collects the information used for data processing within the PC-Kimmo functions. Its general purpose is to reduce the number of parameters needed by the various functions. `bLimit' limits the processing to a single good result if `TRUE' (nonzero). `iTraceLevel' is the degree of tracing output desired from PC-Kimmo processes (0 means none). `bUsePATR' causes the word grammar to be applied to the output of the two-level processor. (This requires that a grammar file be loaded). `bSilent' disables messages to the "standard error" stream (`stderr'). `bShowWarnings' enables warning messages as well as error messages. `bAlignment' causes recognizer file or screen output to be aligned vertically (underlying form above gloss) if `TRUE'. `cGlossBegin' `cGlossEnd' are the characters that optionally surround morphnames in synthesizer input strings. `cComment' is the character that begins a comment in an input line. (`KIMMO_DEFAULT_COMMENT' is a symbol for the default value.) `pLogFP' is the `FILE' pointer for an output log file (`NULL' means none). `ppszAlphabet' points to a dynamically allocated array of alphabetic characters. The "characters" are stored as strings since they may be digraphs, trigraphs, etc. A maximum of 252 different characters may be stored. The array is `NULL' terminated. `uiAlphabetSize' is the number of alphabetic "characters" in the array. `cNull' is the "null" character. `cAny' is the "wild card" character that matches any character. `cBoundary' is the word boundary character that matches the beginning or the end of a word. `bTwoLCFile' records whether the rules were loaded from a TwoLC type rules file. `pSubsets' points to a dynamically allocated array of alphabet subset data structures. `uiSubsetCount' is the number of alphabet subsets in the array. `pAutomata' points to a dynamically allocated array of two-level automata data structures. `uiAutomataSize' is the number of rules in the automata array. `pFeasiblePairs' points to a dynamically allocated array of feasible pair data structures. `uiFeasiblePairsCount' is the number of feasible pairs in the array. `pszRulesFile' points to the name of the current PC-Kimmo rules file (`NULL' means none). `pAlternations' points to a dynamically allocated array of lexicon section alternation (continuation class) data structures. `uiAlternationCount' is the number of alternations in the array. `pLexiconSections' points to an array of PC-Kimmo lexicon data structures, one for each section of the lexicon. `pInitialLexicon' points to the logical first ("INITIAL") section of lexicon. `uiLexiconSectionCount' is the number of sections (data structures) in the lexicon array. `ppszFeatures' points to a dynamically allocated array of possible feature (template) labels. `uiFeatureCount' is the number of feature labels in the array. `pszLexiconFile' points to the name of the current PC-Kimmo lexicon file (`NULL' means none). `sPATR' contains the information loaded from the grammar file, which includes the name of the current PC-Kimmo word grammar file. Source File ----------- `kimmo.h' KimmoResult =========== Definition ---------- /* * type definition for KimmoResult * needed for patr.h */ typedef struct kimmo_result KimmoResult; #include "patr.h" /* needed for PATREdgeList */ /* * forward declaration for internal data type */ typedef struct kimmo_morpheme KimmoMorpheme; struct kimmo_result { KimmoResult * pNext; unsigned char * pszSynthesis; KimmoMorpheme * pAnalysis; PATREdgeList * pParseChart; unsigned char * pszResult; unsigned char * pszGloss; short bOkay; }; Description ----------- The `KimmoResult' data structure contains a single result from one of the PC-Kimmo processing functions (`applyKimmoGenerator', `applyKimmoRecognizer', or `applyKimmoSynthesizer'). It can be used to build a linked list for ambiguous results. `pNext' points to the next result, if any. `pszSynthesis' points to a synthesized surface form created by `applyKimmoGenerator' or `applyKimmoSynthesizer'. `pAnalysis' points to a list of morpheme data structures created by `applyKimmoRecognizer'. `pParseChart' points to a PATR parse chart created by `applyKimmoRecognizer'. `pszResult' points to a result string created by `applyKimmoGenerator', `applyKimmoRecognizer', or `applyKimmoSynthesizer'. It differs from `pszSynthesis' in that it has the "null" characters removed. `pszGloss' points to a "gloss" string created by `applyKimmoRecognizer'. `bOkay' is a Boolean variable made available for programmar use. It is set `FALSE' by `applyKimmoGenerator', `applyKimmoRecognizer', and `applyKimmoSynthesizer'. Source File ----------- `kimmo.h' The PC-Kimmo function library global variables ********************************************** This chapter gives the proper usage information about each of the global variables found in the PC-Kimmo function library. The `kimmo.h' header file contains the extern declarations for all of these variables. bCancelKimmoOperation_g ======================= Syntax ------ #include "kimmo.h" extern int bCancelKimmoOperation_g; Description ----------- `bCancelKimmoOperation_g' can be set asynchronously to interrupt a PC-Kimmo parse that seems to be stuck. Example ------- #include #include "kimmo.h" #include "patr.h" ... void sigint_handler(int iSignal_in) { bCancelKimmoOperation_g = TRUE; bCancelPATROperation_g = TRUE; /* remember embedded PATR parser */ signal(SIGINT, sigint_handler); } ... signal(SIGINT, sigint_handler); ... Source File ----------- `kimmdata.c' cKimmoPatchSep_g ================ Syntax ------ #include "kimmo.h" extern const char cKimmoPatchSep_g; Description ----------- `cKimmoPatchSep_g' is used to separate the revision and patch level values when printing the PC-Kimmo version number. `'a'' indicates an alpha release, `'b'' indicates a beta release, and `'.'' indicates a production release. Example ------- See the example for `iKimmoVersion_g' below. Source File ----------- `kimmdata.c' iKimmoPatchlevel_g ================== Syntax ------ #include "kimmo.h" extern const int iKimmoPatchlevel_g; Description ----------- `iKimmoPatchlevel_g' is the current "patch level" of the PC-Kimmo function library and program. This is the third level version number, reflecting bug fixes or internal improvements that should be functionally invisible to users. Example ------- See the example for `iKimmoVersion_g' below. Source File ----------- `kimmdata.c' iKimmoRevision_g ================ Syntax ------ #include "kimmo.h" extern const int iKimmoRevision_g; Description ----------- `iKimmoRevision_g' is the current "revision level" of the PC-Kimmo function library and program. This is the second level version number, reflecting changes to program behavior that require changes to the `PC-Kimmo Reference Manual'. Example ------- See the example for `iKimmoVersion_g' below. Source File ----------- `kimmdata.c' iKimmoVersion_g =============== Syntax ------ #include "kimmo.h" extern const int iKimmoVersion_g; Description ----------- `iKimmoVersion_g' is the current "version" number of the PC-Kimmo function library and program. This is the top level version number, reflecting a major rewrite of the program or major changes that make it incompatible with earlier versions of the program. Example ------- #include #include "kimmo.h" ... fprintf(stderr, "PC-Kimmo version %d.%d%c%d (%s), Copyright %s SIL\n", iKimmoVersion_g, iKimmoRevision_g, cKimmoPatchSep_g, iKimmoPatchlevel_g, pszKimmoDate_g, pszKimmoYear_g); #ifdef __DATE__ fprintf(stderr, pszKimmoCompileFormat_g, pszKimmoCompileDate_g, pszKimmoCompileTime_g); #else if (pszKimmoTestVersion_g != NULL) fputs(pszKimmoTestVersion_g, stderr); #endif ... Source File ----------- `kimmdata.c' pszKimmoCompileDate_g ===================== Syntax ------ #include "kimmo.h" #ifdef __DATE__ extern const char * pszKimmoCompileDate_g; #endif Description ----------- `pszKimmoCompileDate_g' points to a string containing the date on which the PC-Kimmo library was compiled. It exists only if the C compiler preprocessor supports the `__DATE__' constant. Example ------- See the example for `iKimmoVersion_g' above. Source File ----------- `kimmdata.c' pszKimmoCompileFormat_g ======================= Syntax ------ #include "kimmo.h" #ifdef __DATE__ #ifdef __TIME__ extern const char * pszKimmoCompileFormat_g; #endif #endif Description ----------- `pszKimmoCompileFormat_g' points to a `printf' style format string suitable for displaying `pszKimmoCompileDate_g' and `pszKimmoCompileTime_g'. It exists only if the C compiler preprocessor supports the `__DATE__' and `__TIME__' constants. Example ------- See the example for `iKimmoVersion_g' above. Source File ----------- `kimmdata.c' pszKimmoCompileTime_g ===================== Syntax ------ #include "kimmo.h" #ifdef __TIME__ extern const char * pszKimmoCompileTime_g; #endif Description ----------- `pszKimmoCompileTime_g' points to a string containing the time at which the PC-Kimmo library was compiled. It exists only if the C compiler preprocessor supports the `__TIME__' constant. Example ------- See the example for `iKimmoVersion_g' above. Source File ----------- `kimmdata.c' pszKimmoDate_g ============== Syntax ------ #include "kimmo.h" extern const char * pszKimmoDate_g; Description ----------- `pszKimmoDate_g' points to a string containing the date on which the PC-Kimmo library was last modified. Example ------- See the example for `iKimmoVersion_g' above. Source File ----------- `kimmdata.c' pszKimmoTestVersion_g ===================== Syntax ------ #include "kimmo.h" #ifndef __DATE__ extern const char * pszKimmoTestVersion_g; #endif Description ----------- `pszKimmoTestVersion_g' points to a string describing the test status of PC-Kimmo (either alpha or beta). If this is a production release version, it is set to `NULL'. It is defined only if the C compiler preprocessor does not support the `__DATE__' constant. Example ------- See the example for `iKimmoVersion_g' above. Source File ----------- `kimmdata.c' pszKimmoYear_g ============== Syntax ------ #include "kimmo.h" extern const char * pszKimmoYear_g; Description ----------- `pszKimmoYear_g' points to a string containing the year in which the PC-Kimmo library was last modified. This is suitable for a copyright notice assigning the copyright to SIL International. Example ------- See the example for `iKimmoVersion_g' above. Source File ----------- `kimmdata.c' uiKimmoCharArraySize_g ====================== Syntax ------ #include "kimmo.h" extern size_t uiKimmoCharArraySize_g; Description ----------- `uiKimmoCharArraySize_g' determines how big a buffer is allocated for holding strings loaded from the PC-Kimmo lexicon. A larger size reduces the number of calls to `malloc', and the amount of memory overhead lost for each allocation, but increases the amount of memory wasted by not being used. The default value is `8000'. Setting `uiKimmoCharArraySize_g' to `0' causes each lexicon string to be individually allocated with `malloc'. Example ------- #include "kimmo.h" ... unsigned char szLexiconFile_g[256]; KimmoData sKimmoData_g; ... uiKimmoCharArraySize_g = 16364; uiKimmoLexItemArraySize_g = 16364; uiKimmoShortArraySize_g = 16364; if (loadKimmoLexicon(szLexiconFile_g, KIMMO_ANALYSIS, &sKimmoData_g) != 0) { reportError(ERROR_MSG, "Cannot open lexicon file %s\n", szLexiconFile_g); } ... Source File ----------- `lexicon.c' uiKimmoLexItemArraySize_g ========================= Syntax ------ #include "kimmo.h" extern size_t uiKimmoLexItemArraySize_g; Description ----------- `uiKimmoLexItemArraySize_g' determines how many lexical item data structures are allocated at a time while loading the PC-Kimmo lexicon. A larger size reduces the number of calls to `malloc', and the amount of memory overhead lost for each allocation, but increases the amount of memory wasted by not being used. The default value is `1000'. Setting `uiKimmoLexItemArraySize_g' to `0' causes each lexical item data structure to be individually allocated with `malloc'. Example ------- See the example for `uiKimmoCharArraySize_g' above. Source File ----------- `lexicon.c' uiKimmoShortArraySize_g ======================= Syntax ------ #include "kimmo.h" extern size_t uiKimmoShortArraySize_g; Description ----------- `uiKimmoShortArraySize_g' determines how large an array of short integers is allocated for dispensing to the individual lexical items while loading the PC-Kimmo lexicon. A larger size reduces the number of calls to `malloc', and the amount of memory overhead lost for each allocation, but increases the amount of memory wasted by not being used. The default value is `2000'. Setting `uiKimmoShortArraySize_g' to `0' causes each lexical item data structure's array of short integers to be individually allocated with `malloc'. Example ------- See the example for `uiKimmoCharArraySize_g' above. Source File ----------- `lexicon.c' PC-Kimmo functions ****************** This document gives the proper usage information about each of the functions found in the PC-Kimmo function library. The prototypes and type definitions relevent to the use of these functions are all found in the `kimmo.h' header file. applyKimmoGenerator =================== Syntax ------ #include "kimmo.h" KimmoResult * applyKimmoGenerator(unsigned char * pszLexForm_in, KimmoData * pKimmo_in); Description ----------- `applyKimmoGenerator' tries to generate the surface form of a word from the provided lexical form. The PC-Kimmo rules must be loaded before this function is called. The arguments to `applyKimmoGenerator' are as follows: `pszLexForm_in' points to a character string containing the lexical (underlying) form of a word. `pKimmo_in' points to the data for the current language. Return Value ------------ a pointer to a list of results, or NULL if unsuccessful Example ------- #include #include #include "kimmo.h" ... KimmoData sKimmoData_g; ... void do_generate(pszForm_in) unsigned char * pszForm_in; { unsigned char * pszLexForm; KimmoResult * pResults; if ((sKimmoData_g.ppszAlphabet == NULL) || (pszForm_in == NULL)) return; pszLexForm = pszForm_in + strspn((char *)pszForm_in, " \t\r\n\f"); if (*pszLexForm == '\0') return; if (sKimmoData_g.pLogFP != NULL) fprintf(sKimmoData_g.pLogFP, "%s\n", pszLexForm); pResults = applyKimmoGenerator(pszLexForm, &sKimmoData_g); writeKimmoResults(pResults, stderr, &sKimmoData_g); if (sKimmoData_g.pLogFP != NULL) writeKimmoResults(pResults, sKimmoData_g.pLogFP, &sKimmoData_g); freeKimmoResult(pResults); } Source File ----------- `generate.c' applyKimmoRecognizer ==================== Syntax ------ #include "kimmo.h" KimmoResult * applyKimmoRecognizer(unsigned char * pszSurfaceForm_in, KimmoData * pKimmo_in); Description ----------- `applyKimmoRecognizer' tries to analyze the provided surface form of a word to create the lexical (underlying) form divided into morphemes. If the word can be divided into morphemes, and a word grammar has been loaded, `applyKimmoRecognizer' also tries to parse the list of morphemes to create a word parse chart with related feature structures. The PC-Kimmo rules and lexicon must be loaded before `applyKimmoRecognizer' is called. If a word parse is desired, the word grammar must also be loaded before calling this function. The arguments to `applyKimmoRecognizer' are as follows: `pszSurfaceForm_in' points to a character string containing the surface form of a word. `pKimmo_in' points to the data for the current language. Return Value ------------ a pointer to a list of results, or NULL if unsuccessful Example ------- #include #include #include "kimmo.h" ... KimmoData sKimmoData_g; ... void do_recognize(pszForm_in) unsigned char * pszForm_in; { unsigned char * pszSurfForm; KimmoResult * pResults; if ( (sKimmoData_g.ppszAlphabet == NULL) || (sKimmoData_g.pLexiconSections == NULL) || (pszForm_in == NULL) ) return; pszSurfForm = pszForm_in + strspn((char *)pszForm_in, " \t\r\n\f"); if (*pszSurfForm == '\0') return; if (sKimmoData_g.pLogFP != NULL) fprintf(sKimmoData_g.pLogFP, "%s\n", pszSurfForm); pResults = applyKimmoRecognizer(pszSurfForm, &sKimmoData_g); writeKimmoResults(pResults, stderr, &sKimmoData_g); if (sKimmoData_g.pLogFP != NULL) writeKimmoResults(pResults, sKimmoData_g.pLogFP, &sKimmoData_g); freeKimmoResult(pResults); } Source File ----------- `recogniz.c' applyKimmoSynthesizer ===================== Syntax ------ #include "kimmo.h" KimmoResult * applyKimmoSynthesizer(unsigned char * pszMorphemes_in, KimmoData * pKimmo_in); Description ----------- `applyKimmoSynthesizer' tries to synthesize a word from a string containing an ordered list of morpheme names (glosses) separated by spaces. The PC-Kimmo rules and synthesis lexicon must be loaded before this function is called. The arguments to `applyKimmoSynthesizer' are as follows: `pszMorphemes_in' points to a character string containing an ordered list of morpheme names (glosses) separated by spaces. `pKimmo_in' points to the data for the current language. The lexicon stored in the data must be accessible by gloss (morpheme name) rather than by lexical (underlying) form. Return Value ------------ a pointer to a list of results, or NULL if unsuccessful Example ------- #include #include #include "kimmo.h" ... KimmoData sKimmoData_g; KimmoData sSynthesisData_g; ... static void fix_synthesis_data() { sSynthesisData_g.bLimit = sKimmoData_g.bLimit; sSynthesisData_g.iTraceLevel = sKimmoData_g.iTraceLevel; sSynthesisData_g.bUsePATR = sKimmoData_g.bUsePATR; sSynthesisData_g.bSilent = sKimmoData_g.bSilent; sSynthesisData_g.bShowWarnings = sKimmoData_g.bShowWarnings; sSynthesisData_g.bAlignment = sKimmoData_g.bAlignment; sSynthesisData_g.cGlossBegin = sKimmoData_g.cGlossBegin; sSynthesisData_g.cGlossEnd = sKimmoData_g.cGlossEnd; sSynthesisData_g.cComment = sKimmoData_g.cComment; sSynthesisData_g.pLogFP = sKimmoData_g.pLogFP; sSynthesisData_g.ppszAlphabet = sKimmoData_g.ppszAlphabet; sSynthesisData_g.uiAlphabetSize = sKimmoData_g.uiAlphabetSize; sSynthesisData_g.cNull = sKimmoData_g.cNull; sSynthesisData_g.cAny = sKimmoData_g.cAny; sSynthesisData_g.cBoundary = sKimmoData_g.cBoundary; sSynthesisData_g.bTwoLCFile = sKimmoData_g.bTwoLCFile; sSynthesisData_g.pSubsets = sKimmoData_g.pSubsets; sSynthesisData_g.uiSubsetCount = sKimmoData_g.uiSubsetCount; sSynthesisData_g.pAutomata = sKimmoData_g.pAutomata; sSynthesisData_g.uiAutomataSize = sKimmoData_g.uiAutomataSize; sSynthesisData_g.pFeasiblePairs = sKimmoData_g.pFeasiblePairs; sSynthesisData_g.uiFeasiblePairsCount = sKimmoData_g.uiFeasiblePairsCount; sSynthesisData_g.pszRulesFile = sKimmoData_g.pszRulesFile; memset(&sSynthesisData_g.sPATR, 0, sizeof(PATRData)); } void do_synthesize(pszForm_in) unsigned char * pszForm_in; { unsigned char * pszMorphForm; KimmoResult * pResults; if ( (sKimmoData_g.ppszAlphabet == NULL) || (sSynthesisData_g.pLexiconSections == NULL) || (pszForm_in == NULL) ) return; pszMorphForm = pszForm_in + strspn((char *)pszForm_in, " \t\r\n\f"); if (*pszMorphForm == '\0') return; fix_synthesis_data(); if (sKimmoData_g.pLogFP != NULL) fprintf(sKimmoData_g.pLogFP, "%s\n", pszMorphForm); pResults = applyKimmoSynthesizer(pszMorphForm, &sKimmoData_g); writeKimmoResults(pResults, stderr, &sKimmoData_g); if (sKimmoData_g.pLogFP != NULL) writeKimmoResults(pResults, sKimmoData_g.pLogFP, &sKimmoData_g); freeKimmoResult(pResults); } Source File ----------- `synthesi.c' checkKimmoRuleStatus ==================== Syntax ------ #include "kimmo.h" int checkKimmoRuleStatus(int iRule_in, KimmoData * pKimmo_in); Description ----------- `checkKimmoRuleStatus' checks whether or not the given rule is active. The arguments to `checkKimmoRuleStatus' are as follows: `iRule_in' is the number of a loaded PC-Kimmo rule. The first rule is numbered 1, not 0 as in C arrays. `pKimmo_in' points to the data for the current language. Return Value ------------ TRUE if the given rule is active, otherwise FALSE Example ------- #include #include "kimmo.h" ... void show_rule_status(KimmoData * pKimmo_in) { int i; int iCount; int iWidth; int bActive; if (pKimmo_in->uiAutomataSize == 0) { fprintf(stderr, " There are no rules.\n"); return; } for ( iCount = 0, i = 1 ; i <= pKimmo_in->uiAutomataSize ; ++i ) { if (checkKimmoRuleStatus(i, pKimmo_in)) ++iCount; } if (iCount == pKimmo_in->uiAutomataSize) { fprintf(stderr, " Rules are ALL ON.\n"); return; } if (iCount == 0) { fprintf(stderr, " Rules are ALL OFF.\n"); return; } fprintf(stderr, " Rules are"); iWidth = 13; for ( i = 1 ; i <= pKimmo_in->uiAutomataSize ; ++i ) { if (iWidth == 0) { fputs(" ", stderr); iWidth = 13; } bActive = checkKimmoRuleStatus(i, pKimmo_in); if (i < pKimmo_in->uiAutomataSize) { fprintf(stderr, "%3d %s", i, bActive ? "ON, ":"OFF,"); iWidth += 8; } else { fprintf(stderr, "%3d %s", i, bActive ? "ON ":"OFF"); iWidth += 7; } if (iWidth >= 72) { putc( '\n', stderr); iWidth = 0; } } if (iLength != 0) putc( '\n', stderr); } Source File ----------- `rules.c' concatKimmoMorphFeatures ======================== Syntax ------ #include "kimmo.h" unsigned char * concatKimmoMorphFeatures( KimmoMorpheme * pMorphemes_in, char * pszSeparate_in, KimmoData * pKimmo_in); Description ----------- `concatKimmoMorphFeatures' concatenates the feature names from a list of morphemes created by `applyKimmoRecognizer' as part of a `KimmoResult' data structure. The arguments to `concatKimmoMorphFeatures' are as follows: `pMorphemes_in' points to a list of morpheme data structures, usually the `pAnalysis' element of a `KimmoResult' data structure. `pszSeparate_in' points to a character string used to separate the feature names in the result. `pKimmo_in' points to the data for the current language. Return Value ------------ a pointer to a dynamically allocated string containing the concatenated feature names from a list of morphemes, or NULL Example ------- #include #include #include "kimmo.h" #include "patr.h" #include "opaclib.h" ... void write_as_WordTemplate(unsigned char * pszForm_in, KimmoResult * pResults_in, KimmoData * pKimmo_in, FILE * pOutputFP_in) { KimmoResult * pResult; WordAnalysis * pAnal; WordTemplate * pWord; if ((pszForm_in == NULL) || (pOutputFP_in == NULL)) return; /* * allocate and initialize a WordTemplate structure */ pWord = (WordTemplate *)allocMemory(sizeof(WordTemplate)); pWord->pszFormat = NULL; pWord->pszOrigWord = pszForm_in; pWord->paWord = NULL; pWord->pszNonAlpha = NULL; pWord->iCapital = 0; pWord->iOutputFlags = WANT_DECOMPOSITION | WANT_FEATURES | WANT_UNDERLYING | WANT_ORIGINAL; pWord->pAnalyses = NULL; pWord->pNewWords = NULL; /* * convert the results into a list of WordAnalysis structures */ for ( pResult = pResults_in ; pResult ; pResult = pResult->pNext ) { pAnal = (WordAnalysis *)allocMemory(sizeof(WordAnalysis)); pAnal->pszAnalysis = (char *)concatKimmoMorphGlosses( pResult->pAnalysis, " ", pKimmo_in); pAnal->pszDecomposition = (char *)concatKimmoMorphLexemes( pResult->pAnalysis, "-", pKimmo_in); pAnal->pszCategory = NULL; pAnal->pszProperties = NULL; pAnal->pszFeatures = (char *)concatKimmoMorphFeatures( pResult->pAnalysis, " ", pKimmo_in); pAnal->pszUnderlyingForm = duplicateString(pResult->pszResult); pAnal->pszSurfaceForm = pszForm_in; pAnal->pNext = pWord->pAnalyses; pWord->pAnalyses = pAnal; } /* * write the WordTemplate data and free the memory it used */ writeTemplate(pOutputFP_in, NULL, pWord, NULL); pWord->pszOrigWord = NULL; for ( pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext ) pAnal->pszSurfaceForm = NULL; freeWordTemplate(pWord); } Source File ----------- `pckfuncs.c' concatKimmoMorphGlosses ======================= Syntax ------ #include "kimmo.h" unsigned char * concatKimmoMorphGlosses( KimmoMorpheme * pMorphemes_in, char * pszSeparate_in, KimmoData * pKimmo_in); Description ----------- `concatKimmoMorphGlosses' concatenates the glosses from a list of morphemes created by `applyKimmoRecognizer' as part of a `KimmoResult' data structure. The arguments to `concatKimmoMorphGlosses' are as follows: `pMorphemes_in' points to a list of morpheme data structures, usually the `pAnalysis' element of a `KimmoResult' data structure. `pszSeparate_in' points to a character string used to separate the morpheme glosses in the result. `pKimmo_in' points to the data for the current language. Return Value ------------ a pointer to a dynamically allocated string containing the concatenated glosses from a list of morphemes, or NULL Example ------- See the example for `concatKimmoMorphFeatures' above. Source File ----------- `pckfuncs.c' concatKimmoMorphLexemes ======================= Syntax ------ #include "kimmo.h" unsigned char * concatKimmoMorphLexemes( KimmoMorpheme * pMorphemes_in, char * pszSeparate_in, KimmoData * pKimmo_in); Description ----------- `concatKimmoMorphLexemes' concatenates the lexical (underlying) forms from a list of morphemes created by `applyKimmoRecognizer' as part of a `KimmoResult' data structure. The arguments to `concatKimmoMorphLexemes' are as follows: `pMorphemes_in' points to a list of morpheme data structures, usually the `pAnalysis' element of a `KimmoResult' data structure. `pszSeparate_in' points to a character string used to separate the morpheme lexical forms in the result. `pKimmo_in' points to the data for the current language. Return Value ------------ a pointer to a dynamically allocated string containing the concatenated lexical forms from a list of morphemes, or NULL Example ------- See the example for `concatKimmoMorphFeatures' above. Source File ----------- `pckfuncs.c' freeKimmoLexicon ================ Syntax ------ #include "kimmo.h" void freeKimmoLexicon(KimmoData * pKimmo_io); Description ----------- `freeKimmoLexicon' frees the memory used to store the lexicon portion of the KimmoData information. `freeKimmoLexicon' has only one argument: `pKimmo_io' points to the data for the current language, which includes the lexicon. Return Value ------------ none Example ------- #include #include "kimmo.h" #include "patr.h" ... KimmoData sKimmoData_g; KimmoData sSynthesisData_g; /* for synthesis lexicon */ ... static void reset_synthesis_data() { sSynthesisData_g.bLimit = FALSE; sSynthesisData_g.iTraceLevel = 0; sSynthesisData_g.bUsePATR = FALSE; sSynthesisData_g.bSilent = FALSE; sSynthesisData_g.bShowWarnings = FALSE; sSynthesisData_g.bAlignment = FALSE; sSynthesisData_g.cGlossBegin = '\0'; sSynthesisData_g.cGlossEnd = '\0'; sSynthesisData_g.cComment = '\0'; sSynthesisData_g.pLogFP = NULL; sSynthesisData_g.ppszAlphabet = NULL; sSynthesisData_g.uiAlphabetSize = 0; sSynthesisData_g.cNull = '\0'; sSynthesisData_g.cAny = '\0'; sSynthesisData_g.cBoundary = '\0'; sSynthesisData_g.bTwoLCFile = FALSE; sSynthesisData_g.pSubsets = NULL; sSynthesisData_g.uiSubsetCount = 0; sSynthesisData_g.pAutomata = NULL; sSynthesisData_g.uiAutomataSize = 0; sSynthesisData_g.pFeasiblePairs = NULL; sSynthesisData_g.uiFeasiblePairsCount = 0; sSynthesisData_g.pszRulesFile = ; memset(&sSynthesisData_g.sPATR, 0, sizeof(PATRData)); } void do_clear() { freeKimmoRules(&sKimmoData_g); freeKimmoLexicon(&sKimmoData_g); freePATRGrammar(&sKimmoData_g.sPATR); sKimmoData_g.bUsePATR = FALSE; freePATRInternalMemory(); reset_synthesis_data(); /* prevent double freeing */ freeKimmoLexicon(&sSynthesisData_g); } Source File ----------- `lexicon.c' freeKimmoResult =============== Syntax ------ #include "kimmo.h" void freeKimmoResult(KimmoResult * pResults_io); Description ----------- `freeKimmoResult' frees the memory used by a list of `KimmoResult' data structures. `freeKimmoResult' has only one argument: `pResults_io' points to a dynamically allocated list of `KimmoResult' data structures. Return Value ------------ none Example ------- See the examples for `applyKimmoGenerator', `applyKimmoRecognizer', or `applyKimmoSynthesizer' above. Source File ----------- `pckfuncs.c' freeKimmoRules ============== Syntax ------ #include "kimmo.h" void freeKimmoRules(KimmoData * pKimmo_io); Description ----------- `freeKimmoRules' frees the memory used to store the rules portion of the KimmoData information. `freeKimmoRules' has only one argument: `pKimmo_io' points to the data for the current language, which includes the rules. Return Value ------------ none Example ------- See the example for `freeKimmoLexicon' above. Source File ----------- `rules.c' loadKimmoLexicon ================ Syntax ------ #include "kimmo.h" int loadKimmoLexicon(unsigned char * pszLexiconFile_in, int eLexiconType_in, KimmoData * pKimmo_io); Description ----------- `loadKimmoLexicon' loads a PC-Kimmo lexicon, starting with the primary lexicon file. If a lexicon has already been loaded, then the existing lexicon is erased before this lexicon file is read. The arguments to `loadKimmoLexicon' are as follows: `pszLexiconFile_in' points to a character string containing the name of the primary lexicon file. `eLexiconType_in' has one of these two values: `KIMMO_ANALYSIS' means that the morphemes are accessed by lexical (underlying) form. `KIMMO_SYNTHESIS' means that the morphemes are accessed by morpheme name (gloss). `pKimmo_io' points to the data for the current language, which includes the lexicon. Return Value ------------ zero if successful, -1 if an error occurs Example ------- #include "kimmo.h" #include "patr.h" ... KimmoData sKimmoData_g; ... /* * load the PC-Kimmo data files. * return the number of files successfully loaded (0-3) */ int load_kimmo_files(char * pszRules_in, char * pszLexicon_in, char * pszGrammar_in) { if (loadKimmoRules(pszRules_in, &sKimmoData_g) != 0) return 0; if (loadKimmoLexicon(pszLexicon_in, KIMMO_ANALYSIS, &sKimmoData_g) != 0) return 1; if (loadPATRGrammar(pszGrammar_in, &sKimmoData_g.sPATR) == 0) return 2; return 3; } Source File ----------- `file.c' loadKimmoRules ============== Syntax ------ #include "kimmo.h" int loadKimmoRules(unsigned char * pszRuleFile_in, KimmoData * pKimmo_io); Description ----------- `loadKimmoRules' loads a PC-Kimmo rules file. If rules have already been loaded, then the existing rules and lexicon are erased before this rules file is read. The arguments to `loadKimmoRules' are as follows: `pszRuleFile_in' points to a character string containing the name of the PC-Kimmo rules file. `pKimmo_io' points to the data for the current language, which includes the rules. Return Value ------------ zero if okay, -1 if an error occurs Example ------- See the example for `loadKimmoLexicon' above. Source File ----------- `file.c' setKimmoRuleStatus ================== Syntax ------ #include "kimmo.h" void setKimmoRuleStatus(int iRule_in, int bValue_in, KimmoData * pKimmo_io); Description ----------- `setKimmoRuleStatus' sets the status (active or inactive) of a given PC-Kimmo rule. The set of feasible pairs is automatically recomputed as a side effect of calling this function. The arguments to `setKimmoRuleStatus' are as follows: `iRule_in' is the number of a loaded PC-Kimmo rule. The first rule is numbered 1, not 0 as in C arrays. If `iRule_in' is equal to zero (`0'), then all of the rules are turned on or off according to `bValue_in'. `bValue_in' is TRUE to make the rule active, or FALSE (zero) to make the rule inactive. `pKimmo_io' points to the data for the current language, which includes the rules. Return Value ------------ none Example ------- #include #include #include #include "kimmo.h" #include "cportlib.h" ... KimmoData sKimmoData_g; ... void do_set_rule(char * pszArgument_in, int bValue_in) { int i; char * pszNumber; char * pszNext; if ( (strcasecmp(pszArgument_in, "all") == 0) || (strcasecmp(pszArgument_in, "al") == 0) || (strcasecmp(pszArgument_in, "a") == 0) ) { setKimmoRuleStatus(0, bValue_in, &sKimmoData_g); return; } for (pszNumber = pszArgument_in ; *pszNumber ; pszNumber = pszNext) { i = strtol(pszNumber, &pszNext, 10); if (pszNext == pszNumber) break; if ((i > 0) && (i <= sKimmoData_g.uiAutomataSize)) setKimmoRuleStatus(i, bValue_in, &sKimmoData_g); else break; } if (*pszNumber != '\0') fprintf(stderr, "Invalid argument to SET RULE: \"%s\"\n", pszNumber); } Source File ----------- `file.c' writeKimmoFeasiblePairs ======================= Syntax ------ #include "kimmo.h" void writeKimmoFeasiblePairs(FILE * pOutputFP_in, KimmoData * pKimmo_in); Description ----------- `writeKimmoFeasiblePairs' writes a list of the current PC-Kimmo feasible pairs to the output file. The arguments to `writeKimmoFeasiblePairs' are as follows: `pOutputFP_in' is an output `FILE' pointer. `pKimmo_in' points to the data for the current language, which includes the current set of feasible pairs. Return Value ------------ none Example ------- #include #include #include "kimmo.h" extern char * strlwr P((char * pszString_io)); ... KimmoData sKimmoData_g; ... void do_list(char * pszArgument_in) { strlwr(pszArgument_in); if ( (strcmp(pszArgument_in, "l") == 0) || (strcmp(pszArgument_in, "lexicon") == 0) ) writeKimmoLexiconSectionNames(stderr, &sKimmoData_g); else if ((strcmp(pszArgument_in, "p") == 0) || (strcmp(pszArgument_in, "pairs") == 0) ) writeKimmoFeasiblePairs(stderr, &sKimmoData_g); else if ((strcmp(pszArgument_in, "r") == 0) || (strcmp(pszArgument_in, "rules") == 0) ) writeKimmoRulesStatus(stderr, &sKimmoData_g); else fprintf(stderr, "Invalid argument for list command: %s\n", pszArgument_in); } Source File ----------- `rules.c' writeKimmoLexiconSection ======================== Syntax ------ #include "kimmo.h" int writeKimmoLexiconSection(unsigned char * pszLexSection_in, FILE * pOutputFP_in, KimmoData * pKimmo_in); Description ----------- `writeKimmoLexiconSection' writes the designated section of the PC-Kimmo lexicon to the output file. This is useful only for debugging purposes. The arguments to `writeKimmoLexiconSection' are as follows: `pszLexSection_in' points to the name of a section of the PC-Kimmo lexicon. `pOutputFP_in' is an output `FILE' pointer. `pKimmo_in' points to the data for the current language, which includes the lexicon. Return Value ------------ `TRUE' if successful, `FALSE' if the lexicon section does not exist Example ------- #include #include "kimmo.h" #include "cmd.h" #include "rpterror.h" ... KimmoData sKimmoData_g; ... void show_lexicon(char * pszLexName_in) { if ((pszLexName_in == NULL) || (pszLexName_in[0] == '\0')) { displayNumberedMessage(&sCmdMissingArgument_g, sKimmoData_g.bSilent, sKimmoData_g.bShowWarnings, sKimmoData_g.pLogFP, NULL, 0, "SHOW LEXICON" ); } else if (writeKimmoLexiconSection(pszLexName_in, stderr, &sKimmoData_g) == FALSE) { displayNumberedMessage(&sCmdBadArgument_g, sKimmoData_g.bSilent, sKimmoData_g.bShowWarnings, sKimmoData_g.pLogFP, NULL, 0, "SHOW LEXICON", pszLexName_in); } } Source File ----------- `lexicon.c' writeKimmoLexiconSectionNames ============================= Syntax ------ #include "kimmo.h" void writeKimmoLexiconSectionNames(FILE * pOutputFP_in, KimmoData * pKimmo_in); Description ----------- `writeKimmoLexiconSectionNames' writes a list of the PC-Kimmo lexicon section names to the output file. The arguments to `writeKimmoLexiconSectionNames' are as follows: `pOutputFP_in' is an output `FILE' pointer. `pKimmo_in' points to the data for the current language, which includes the lexicon. Return Value ------------ none Example ------- See the example for `writeKimmoFeasiblePairs' above. Source File ----------- `lexicon.c' writeKimmoResults ================= Syntax ------ #include "kimmo.h" void writeKimmoResults(KimmoResult * pResults_in, FILE * pOutputFP_in, KimmoData * pKimmo_in); Description ----------- `writeKimmoResults' writes a list of PC-Kimmo results to the output file. If `pResults_in' is `NULL', then nothing is written to the output file. The arguments to `writeKimmoResults' are as follows: `pResults_in' points to a list of PC-Kimmo processing results as produced by `applyKimmoGenerator', `applyKimmoRecognizer', or `applyKimmoSynthesizer'. `pOutputFP_in' is an output `FILE' pointer. `pKimmo_in' points to the data for the current language. Return Value ------------ none Example ------- See the examples for `applyKimmoGenerator', `applyKimmoRecognizer', or `applyKimmoSynthesizer' above. Source File ----------- `pckfuncs.c' writeKimmoRule ============== Syntax ------ #include "kimmo.h" void writeKimmoRule(unsigned uiRuleNumber_in, FILE * pOutputFP_in, KimmoData * pKimmo_in); Description ----------- `writeKimmoRule' writes the designated PC-Kimmo rule to the output file. This is useful only for debugging purposes. The arguments to `writeKimmoRule' are as follows: `uiRuleNumber_in' is the 1-based index number of a PC-Kimmo rule loaded from a rules file. `pOutputFP_in' is an output `FILE' pointer. `pKimmo_in' points to the data for the current language. Return Value ------------ none Example ------- #include #include "kimmo.h" #include "cmd.h" ... KimmoData sKimmoData_g; ... void do_show_rule(char * pszArgument_in) { int k; if (pszArgument_in == (char *)NULL) { displayNumberedMessage(&sCmdMissingKeyword_g, sKimmoData_g.bSilent, sKimmoData_g.bShowWarnings, sKimmoData_g.pLogFP, NULL, 0, "SHOW RULE"); return; } k = atoi(pszArgument_in); if ( (k <= 0) || (k > sKimmoData_g.uiAutomataSize) ) displayNumberedMessage(&sCmdBadArgument_g, sKimmoData_g.bSilent, sKimmoData_g.bShowWarnings, sKimmoData_g.pLogFP, NULL, 0, "SHOW RULE", pszArgument_in); else writeKimmoRule( k, stderr, &sKimmoData_g ); } Source File ----------- `rules.c' writeKimmoRulesStatus ===================== Syntax ------ #include "kimmo.h" void writeKimmoRulesStatus(FILE * pOutputFP_in, KimmoData * pKimmo_in); Description ----------- `writeKimmoRulesStatus' writes the status (`"on"' or `"off"' and name for each of the PC-Kimmo rules currently loaded from a rules file. The arguments to `writeKimmoRulesStatus' are as follows: `pOutputFP_in' is an output `FILE' pointer. `pKimmo_in' points to the data for the current language, which includes the rules. Return Value ------------ none Example ------- See the example for `writeKimmoFeasiblePairs' above. Source File ----------- `file.c' Index ***** Table of Contents ***************** Introduction to the PC-Kimmo function library Variable and function naming conventions Preprocessor macro names Data structure names Variable names Function names Examples PC-Kimmo data structures KimmoData KimmoResult The PC-Kimmo function library global variables bCancelKimmoOperation_g cKimmoPatchSep_g iKimmoPatchlevel_g iKimmoRevision_g iKimmoVersion_g pszKimmoCompileDate_g pszKimmoCompileFormat_g pszKimmoCompileTime_g pszKimmoDate_g pszKimmoTestVersion_g pszKimmoYear_g uiKimmoCharArraySize_g uiKimmoLexItemArraySize_g uiKimmoShortArraySize_g PC-Kimmo functions applyKimmoGenerator applyKimmoRecognizer applyKimmoSynthesizer checkKimmoRuleStatus concatKimmoMorphFeatures concatKimmoMorphGlosses concatKimmoMorphLexemes freeKimmoLexicon freeKimmoResult freeKimmoRules loadKimmoLexicon loadKimmoRules setKimmoRuleStatus writeKimmoFeasiblePairs writeKimmoLexiconSection writeKimmoLexiconSectionNames writeKimmoResults writeKimmoRule writeKimmoRulesStatus Index