OPAC Function Library Reference Manual functions for linguistic data processing July 1998 by Stephen McConnel Copyright (C) 2000 SIL International Published by: Language Software Development SIL International 7500 W. Camp Wisdom Road Dallas, TX 75236 U.S.A. Permission is granted to make and distribute verbatim copies of this file provided the copyright notice and this permission notice are preserved in all copies. The author may be reached at the address above or via email as `steve@acadcomp.sil.org'. Introduction to the OPAC function library ***************************************** This document describes a library of data structures and functions developed over the years for programs in the Occasional Publications in Academic Computing series published by SIL International. (For SIL International, "academic" refers to linguistics, literacy, anthropology, translation, and related fields.) It is hoped that this documentation will make future maintenance of these programs easier. Variable and function naming conventions **************************************** The basic goal behind choosing names in the OPAC function library is for the name to convey information about what it represents. This is achieved in two ways: striving for a descriptive name rather than a short cryptic abbreviated name, and following a different pattern of capitalization for each type of name. Preprocessor macro names ======================== Preprocessor macro names are written entirely in capital letters. If the name requires more than one word for an adequate description, the words are joined together with intervening underscore (`_') characters. Data structure names ==================== Data structure names consist of one or more capitalized words. If the name requires more than one word for an adequate description, the words are joined together without underscores, depending on the capitalization pattern to make them readable as separate words. Variable names ============== Variable names in the OPAC function library follow a modified form of the Hungarian naming convention described by Steve McConnell in his book `Code Complete' on pages 202-206. Variable names have three parts: a lowercase type prefix, a descriptive name, and a scope suffix. Type prefix ----------- The type prefix has the following basic possibilities: `b' a Boolean, usually encoded as a `char', `short', or `int' `c' a character, usually a `char' but sometimes a `short' or `int' `d' a double precision floating point number, that is, a `double' `e' an enumeration, encoded as an `enum' or as a `char', `short', or `int' `i' an integer, that is, an `int', `short', `long', or (rarely) `char' `s' a data structure defined by a `struct' statement `sz' a NUL (that is, zero) terminated character string `pf' a pointer to a function In addition, the basic types may be prefixed by these qualifiers: `u' indicates that an integer or a character is unsigned `a' indicates an array of the basic type `p' indicates a pointer to the type, possibly a pointer to an array or to a pointer Descriptive name ---------------- The descriptive name portion of a variable name consists of one or more capitalized words concatenated together. There are no underscores (`_') separating these words from each other, or from the type prefix. For the OPAC function library, the descriptive name for global variables may begin with the name of the most relevant data strucure, if any. Scope suffix ------------ The scope suffix has these possibilities: `_g' indicates a global variable accessible throughout the program `_m' indicates a module (semiglobal) variable accessible throughout the file (declared `static') `_in' indicates a function argument used for input `_out' indicates a function argument used for output (must be a pointer) `_io' indicates a function argument used for both input and output (must be a pointer) `_s' indicates a function variable that retains its value between calls (declared `static') The lack of a scope suffix indicates that a variable is declared within a function and exists on the stack for the duration of the current call. Function names ============== Global function names in the OPAC function library have two parts: a verb that is all lowercase followed by a noun phrase containing one or more capitalized words. These pieces are concatanated without any intervening underscores (`_'). For the OPAC library functions, the noun phrase section includes the name of the most relevant data strucure, if any. Examples ======== Given the discussion above, it is easy to discern at a glance what type of item each of the following names refers to. `SAMPLE_NAME' is a preprocessor macro. `SampleName' is a data structure. `pSampleName' is a local pointer variable. `writeSampleName' is a function (that may apply to a data structure named `SampleName'). The OPAC function library data structures ***************************************** This chapter describes the data structures defined for the OPAC function library. These include both general purpose data collection structures and specialized linguistic processing data structures. For each data structure that the library provides, this information includes which header files to include in your source to obtain its definition. CaselessLetter ============== Definition ---------- #include "textctl.h" /* or template.h or opaclib.h */ typedef struct caseless_letter { unsigned char * pszLetter; struct caseless_letter * pNext; } CaselessLetter; Description ----------- The `CaselessLetter' data structure is normally used only inside a `TextControl' data structure. It stores a multibyte character string that represents a single caseless letter. The fields of the `CaselessLetter' data structure are as follows: `pszLetter' points to a caseless multigraph character string. This string is one or more characters (bytes) long, and is terminated by a NUL byte. `pNext' is a pointer to facilitate keeping a list of caseless letters. Source File ----------- `textctl.h' Change ====== Definition ---------- #include "change.h" /* or textctl.h or template.h or opaclib.h */ typedef struct change_list { char * pszMatch; char * pszReplace; ChangeEnvironment * pEnvironment; char * pszDescription; struct change_list * pNext; } Change; Description ----------- A `Change' data structure stores a single "consistent change" to apply to character strings. Such consistent changes are usually used as ordered lists of changes rather than being applied in isolation here and there. The fields of the `Change' data structure are as follows: `pszMatch' points to the substring to match in the original string. `pszReplace' points to the string with which to replace matched substrings in the output. `pEnvironment' points to the list of alternative environments (if any) for this change. See `ChangeEnvironment' below. `pszDescription' points to an optional comment string that describes this change. `pNext' is a pointer to facilitate keeping an ordered list of changes. Source File ----------- `change.h' ChangeEnvironment ================= Definition ---------- #include "change.h" /* or textctl.h or template.h or opaclib.h */ typedef struct chg_envir { short bNot; ChgEnvItem * pLeftEnv; ChgEnvItem * pRightEnv; struct chg_envir * pNext; } ChangeEnvironment; Description ----------- The `ChangeEnvironment' data structure is normally used only inside a `Change' data structure. The fields of the `ChangeEnvironment' data structure are as follows: `bNot' indicates the negation of this environment. `pLeftEnv' points to the environment to the left of the matched substring. `pRightEnv' points to the environment to the right of the matched substring. `pNext' points to the next alternative constraint. Source File ----------- `change.h' ChgEnvItem ========== Definition ---------- #include "change.h" /* or textctl.h or template.h or opaclib.h */ typedef struct chg_env_item { char iFlags; union { char * pszString; StringClass * pClass; } u; struct chg_env_item * pNext; } ChgEnvItem; Description ----------- The `ChgEnvItem' data structure is normally used only inside a `ChangeEnvironment' data structure, which is normally used only inside a `Change' data structure. The fields of the `ChgEnvItem' data structure are as follows: `iFlags & E_NOT' signals that this item is not wanted. `iFlags & E_CLASS' signals that this item refers to a class of strings instead of a literal string. `iFlags & E_ELLIPSIS' signals that this item may possibly not be contiguous. `iFlags & E_OPTIONAL' signals that this item is optional. `u.pszString' points to a literal string if `iFlags & E_CLASS' is `0'. `u.pClass' points to a `StringClass' data structure if `iFlags & E_CLASS' is not `0'. See `StringClass' below. `pNext' points to the next item in the environment, if any. Source File ----------- `change.h' CodeTable ========= Definition ---------- #include "record.h" /* or opaclib.h */ typedef struct { char * pCodeTable; unsigned uiCodeCount; char * pszFirstCode; } CodeTable; Description ----------- The `CodeTable' data structure is used to map between the field codes used in a standard format file and single characters used in `case' labels inside `switch' statements in C code. The fields of the `CodeTable' data structure are as follows: `pCodeTable' points to a primitive change string such as `"match1\0A\0match2\0B\0"'. Note that the replacement strings are assumed to be single characters. `uiCodeCount' is the number of entries (match strings with replacement characters) in `pCodeTable'. `pszFirstCode' points to the record marker string, that is, the field code that marks the beginning of a record in the input file. This would usually be one of the match strings embedded in `pCodeTable'. Source File ----------- `record.h' LowerLetter =========== Definition ---------- #include "textctl.h" /* or template.h or opaclib.h */ typedef struct lower_letter { unsigned char * pszLower; StringList * pUpperList; struct lower_letter * pNext; } LowerLetter; Description ----------- The `LowerLetter' data structure is normally used only inside a `TextControl' data structure. It stores a multibyte character string that represents a single lowercase letter. It also stores a list of the corresponding uppercase multigraph character strings. The fields of the `NumberedMessage' data structure are as follows: `pszLower' points to a lowercase multigraph character string. This string is one or more characters (bytes) long, and is terminated by a NUL byte. `pUpperList' points to a list of uppercase multigraph character strings. This list has at least one element, but may have any number of elements if the orthography is ambiguous about converting from lowercase to uppercase forms. (This is quite unlikely, but allowed by this software.) `pNext' is a pointer to facilitate keeping a list of lowercase letters. Source File ----------- `textctl.h' NumberedMessage =============== Definition ---------- #include "rpterror.h" /* or opaclib.h */ typedef struct { int eType; unsigned uiNumber; char * pszMessage; } NumberedMessage; Description ----------- The `NumberedMessage' data structure stores the information for a single numbered error or warning message. This is the style of error reporting used by the PC-Kimmo and PC-PATR programs. The fields of the `NumberedMessage' data structure are as follows: `eType' is the type of message, one of these symbolic constants: `ERROR_MSG' is a severe error that aborts the procedure. `WARNING_MSG' is a minor error that user should be made aware of. `DEBUG_MSG' is a debugging message intended for the programmer. `uiNumber' is the (unique) message number. `pszMessage' is a `printf' style format string for the message. Source File ----------- `rpterror.h' StringClass =========== Definition ---------- #include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ typedef struct string_class { char * pszName; StringList * pMembers; struct string_class * pNext; } StringClass; Description ----------- The `StringClass' data structure stores a labeled set of strings. The intention is that any one of the set of strings may be used in a matching operation. The fields of the `StringClass' data structure are as follows: `pszName' points to the name of the string class. `pMembers' points to the list of members of the string class. See `StringList' below. `pNext' is a pointer to facilitate keeping a list of string classes. Source File ----------- `strclass.h' StringList ========== Definition ---------- #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ typedef struct strlist { char * pszString; struct strlist * pNext; } StringList; Description ----------- The `StringList' data structure is used to store a collection of character strings. This collection may be a set (no duplicate strings), an ordered list, or an unordered list, depending on how the programmer adds strings to the list. The fields of the `StringList' data structure are as follows: `pszString' points to a stored string. `pNext' points to the next string in the list of strings. This is one of the most commonly used data structures in the OPAC function library. Source File ----------- `strlist.h' TextControl =========== Definition ---------- #include "textctl.h" /* or template.h or opaclib.h */ typedef struct text_control { char * pszTextControlFile; LowerLetter * pLowercaseLetters; UpperLetter * pUppercaseLetters; CaselessLetter * pCaselessLetters; Change * pOrthoChanges; Change * pOutputChanges; StringList * pIncludeFields; StringList * pExcludeFields; unsigned char cFormatMark; unsigned char cAmbig; unsigned char cDecomp; unsigned char cBarMark; unsigned char * pszBarCodes; char bIndividualCapitalize; char bCapitalize; unsigned uiMaxAmbigDecap; } TextControl; Description ----------- The `TextControl' data structure is used to control reading a text file into a (sequence of) `WordTemplate' data structure(s), or writing a (sequence of) `WordTemplate' data structure(s) to a text file. The fields of the `TextControl' data structure are as follows: `pszTextControlFile' points to the name of the file that the data is loaded from. `pLowercaseLetters' points to a list of lowercase word formation character multigraphs, each of which has a list of one or more corresponding uppercase multigraphs. This list is sorted by decreasing length of the lowercase multigraph string. See `LowerLetter' above. `pUppercaseLetters' points to a list of lowercase word formation character multigraphs, each of which has a list of one or more corresponding uppercase multigraphs. This list is sorted by decreasing length of the uppercase multigraph string. See `UpperLetter' below. `pCaselessLetters' points to a list of word formation character multigraphs that do not have distinct lowercase and uppercase forms. This list is sorted by decreasing length of the multigraph string. See `CaselessLetter' above. `pOrthoChanges' points to an ordered list of input orthography changes. See `Change' above. `pOutputChanges' points to an ordered list of output (orthography) changes. See `Change' above. `pIncludeFields' points to a list of format markers (fields) to include. See `StringList' above. `pExcludeFields' points to a list of format markers (fields) to exclude. See `StringList' above. `cFormatMark' is the initial character of format markers. `cAmbig' is the character for marking ambiguities and failures. `cDecomp' is the character for marking decomposition of words into morphemes. `cBarMark' is the initial character of secondary format markers. `pszBarCodes' points to a string of characters for secondary format markers. `bIndividualCapitalize' flags whether or not to capitalize individual letters within words. `bCapitalize' flags whether or not to decapitalize (recapitalize) words. `uiMaxAmbigDecap' is the maximum number of ambiguous decapitalizations allowed. Source File ----------- `textctl.h' Trie ==== Definition ---------- #include "trie.h" /* or opaclib.h */ typedef struct s__trienode { unsigned char cLetter; struct s__trienode * pChildren; struct s__trienode * pSiblings; void * pTrieInfo; } Trie; Description ----------- A trie is a data structure designed for relatively fast insertion and relatively fast retrieval of information referenced by a "key" string. See Knuth 1973, pages 481-505, for an extended treatment of tries. The fields of the `Trie' data structure are as follows: `cLetter' is the letter (key character) at this node. `pChildren' points to the children Trie nodes, those that have `cLetter' in their key at this point. `pSiblings' points to the sibling Trie nodes, those that have an alternative to `cLetter' in their key at this point. `pTrieInfo' points to the stored information, which may be a linked list, an array, or anything the programmer desires. Source File ----------- `trie.h' UpperLetter =========== Definition ---------- #include "textctl.h" /* or template.h or opaclib.h */ typedef struct upper_letter { unsigned char * pszUpper; StringList * pLowerList; struct upper_letter * pNext; } UpperLetter; Description ----------- The `UpperLetter' data structure is normally used only inside a `TextControl' data structure. It stores a multibyte character string that represents a single uppercase letter. It also stores a list of the corresponding lowercase multigraph character strings. The fields of the `NumberedMessage' data structure are as follows: `pszUpper' points to a uppercase multigraph character string. This string is one or more characters (bytes) long, and is terminated by a NUL byte. `pLowerList' points to a list of lowercase multigraph character strings. This list has at least one element, but may have any number of elements if the orthography is ambiguous about converting from uppercase to lowercase forms. `pNext' is a pointer to facilitate keeping a list of uppercase letters. Application programmers should not need to use this data structure directly, as its only use is for a list embedded in the `TextControl' data structure. Source File ----------- `textctl.h' WordAnalysis ============ Definition ---------- #include "template.h" /* or opaclib.h */ typedef struct word_analysis { char * pszAnalysis; char * pszDecomposition; char * pszCategory; char * pszProperties; char * pszFeatures; char * pszUnderlyingForm; char * pszSurfaceForm; struct word_analysis * pNext; } WordAnalysis; Description ----------- The `WordAnalysis' data structure is normally used a part of a `WordTemplate' data structure to record the result of morphological analysis. The fields of the `WordAnalysis' data structure are as follows: `pszAnalysis' points to an analysis (morphname) string. `pszDecomposition' points to the surface form of the word, hyphenated to show morpheme breaks. The "hyphen" character is typically the one given by the `cDecomp' field of a `TextControl' data structure. `pszCategory' points to the probable word category, possibly followed by morpheme categories. Categories within a morpheme are separated by spaces, and morphemes are separated by equal signs (`='). `pszProperties' points to the morpheme properties, if any. Properties within a morpheme are separated by spaces, and morphemes are separated by equal signs (`='). `pszFeatures' points to the morpheme features, if any. Features within a morpheme are separated by spaces, and morphemes are separated by equal signs (`='). `pszUnderlyingForm' points to the underlying morpheme forms, separated by the character given by the `cDecomp' field of a `TextControl' data structure. `pszSurfaceForm' points to the wordform after decapitalization and orthography changes. `pNext' is a pointer to facilitate keeping a list of alternative analyses. Source File ----------- `template.h' WordTemplate ============ Definition ---------- typedef struct { char * pszFormat; char * pszOrigWord; char ** paWord; char * pszNonAlpha; short iCapital; short iOutputFlags; WordAnalysis * pAnalyses; StringList * pNewWords; } WordTemplate; Description ----------- The `WordTemplate' data structure is used to hold a single word for processing, with the original capitalization and punctuation preserved for restoration on output. The fields of the `WordTemplate' data structure are as follows: `pszFormat' points to a string that contains any "formatting" (non-word) information prior to the word. `pszOrigWord' points to a string containing the original input word. `paWord' points to a `NULL'-terminated array of alternative surface forms after decapitalization and orthography changes. `pszNonAlpha' points to a string containing any "formatting" (non-word) information following the word. `iCapital' is a capitalization flag with one of the following values: `NOCAP' indicates that there are not uppercase letters in the word. `INITCAP' indicates that only the first letter of the word (that can be upppercase) is uppercase. `ALLCAP' indicates that there are no lowercase letters in the word, and two or more uppercase letters. `4-65535' indicates that the word is "mixed case", not describable by one of the standard three values. The number can be interpreted as a bit vector, where `4' is the first letter being capitalized, `8' is the second letter being capitalized, and so on. This scheme handles only the first 14 characters of the word. `iOutputFlags & WANT_DECOMPOSITION' causes the decomposition fields (`pAnalyses->pszDecomposition') to be written to an output file if set (nonzero). `iOutputFlags & WANT_CATEGORY' causes the category fields (`pAnalyses->pszCategory') to be written to an output file if set. `iOutputFlags & WANT_PROPERTIES' causes the property fields (`pAnalyses->pszProperties') to be written to an output file if set. `iOutputFlags & WANT_FEATURES' causes the feature descriptor fields (`pAnalyses->pszFeatures') to be written to an output file if set. `iOutputFlags & WANT_UNDERLYING' causes the underlying form fields (`pAnalyses->pszUnderlyingForm') to be written to an output file if set. `iOutputFlags & WANT_ORIGINAL' causes the original word (`pszOrigWord') to be written to an output file if set. `pAnalyses' points to a list of morphological parses produced by analysis functions, and possibly modified by transfer functions. See `WordAnalysis' above. `pNewWords' points to a list of wordforms created by synthesis functions. See `StringList' above. Source File ----------- `template.h' The OPAC function library global variables ****************************************** This chapter gives the proper usage information about each of the global variables found in the OPAC function library. For each global variable that the library provides, this information includes which header files to include in your source to obtain the extern declaration for that variable. pfOutOfMemory_g =============== Syntax ------ #include "allocmem.h" /* or opaclib.h */ extern void (* pfOutOfMemory_g)(size_t uiSize_in); Description ----------- `pfOutOfMemory_g' points to a function used by `allocMemory' and related functions whenever `malloc' or `realloc' return a `NULL'. This function has one argument, the size of the allocation request that failed. It is assumed that this function does not return normally, so that programs that use `allocMemory' do not need to check for a successful memory allocation. This can be satisfied either by aborting the program or by judicious use of `setjmp' and `longjmp'. The default value for `pfOutOfMemory_g' is `NULL'. This causes a function to be used which simply displays an error message (using `szOutOfMemoryMarker_g') and aborts the program. Example ------- #include #include #include "allocmem.h" ... static jmp_buf jmpNoMemory_m; static void out_of_memory(uiRequest_in) size_t uiRequest_in; { fprintf(stderr, "Out of memory requesting %lu bytes---trying to recover", (unsigned long)uiRequest_in); longjmp( jmpNoMemory_m, 1 ); } char * processData() { char * p; if (setjmp( jmpNoMemory_m )) { /* free any memory left hanging in mid air */ ... return NULL; } pfOutOfMemory_g = out_of_memory; p = processSafely(); pfOutOfMemory_g = NULL; /* restore default behavior */ return p; } Source File ----------- `allocmem.c' pRecordBuffer_g =============== Syntax ------ #include "record.h" /* or opaclib.h */ extern char * pRecordBuffer_g; Description ----------- `pRecordBuffer_g' points to the dynamically allocated buffer used by `readStdFormatRecord' for its return value. Allocating this buffer is handled automatically (but perhaps not optimally) if the programmer does not allocate it explicitly. Example ------- #include "record.h" #include "allocmem.h" #define BIG_RECSIZE 16000 #define SMALL_RECSIZE 500 ... /* * allocate space for records */ pRecordBuffer_g = (char *)allocMemory( BIG_RECSIZE ); uiRecordBufferSize_g = BIG_RECSIZE; ... /* * reduce amount of memory allocated for records */ freeMemory( pRecordBuffer_g ); pRecordBuffer_g = (char *)allocMemory( SMALL_RECSIZE ); uiRecordBufferSize_g = SMALL_RECSIZE; ... /* * release memory allocated for records */ cleanupAfterStdFormatRecord(); Source File ----------- `record.c' szOutOfMemoryMarker_g ===================== Syntax ------ #include "allocmem.h" /* or opaclib.h */ extern char szOutOfMemoryMarker_g[/*101*/]; Description ----------- `szOutOfMemoryMarker_g' is a character array used by `allocMemory' and friends whenever `malloc' or `realloc' return a `NULL' and `pfOutOfMemory_g' is `NULL'. The contents of the character array are used as part of the error message notifying the user that a request for more memory has failed. The default value for `szOutOfMemoryMarker_g' is to be empty (all `NUL' bytes). This means that no context sensitive information is provided in the error message displayed just before the program aborts. Example ------- #include "allocmem.h" ... int * piArray; ... strncpy(szOutOfMemoryMarker_g, "creating huge array", 100); piArray = allocMemory( 100000 * sizeof(int) ); Source File ----------- `allocmem.c' szRecordKey_g ============= Syntax ------ #include "record.h" /* or opaclib.h */ /*#define MAX_RECKEY_SIZE 64*/ extern char szRecordKey_g[MAX_RECKEY_SIZE]; Description ----------- `readStdFormatRecord' stores the first `MAX_RECKEY_SIZE-1' characters following the record marker in `szRecordKey_g'. This may or may not be useful information. Example ------- #include #include "record.h" #include "rpterror.h" ... void load_dictionary( char * pszInputFile_in, CodeTable * pCodeTable_in, int cComment_in) { FILE * pInputFP; char * pRecord; char * pszField; char * pszNextField; unsigned uiRecordCount = 0; pInputFP = fopen(pszInputFile_in, "r"); if (pInputFP == NULL) { reportError(WARNING_MSG, "Cannot open dictionary file %s\n", pszInputFile_in); return; } while ((pRecord = readStdFormatRecord(pInputFP, pCodeTable_in, cComment_in, &uiRecordCount)) != NULL) { pszField = pRecord; while (*pszField) { pszNextField = pszField + strlen(pszField) + 1; switch (*pszField) { case 'A': ... break; case 'B': ... break; ... default: reportError(WARNING_MSG, "Warning: unrecognized field in record %u (%s)\n%s\n", uiRecordCount, szRecordKey_in, pszField); break; } ... pszField = pszNextField; } ... } cleanupAfterStdFormatRecord(); fclose(pInputFP); ... } Source File ----------- `record.c' uiRecordBufferSize_g ==================== Syntax ------ #include "record.h" /* or opaclib.h */ extern unsigned uiRecordBufferSize_g; Description ----------- `uiRecordBufferSize_g' stores the number of bytes allocated for `pRecordBuffer_g'. Example ------- See the example for `pRecordBuffer_g' above. Source File ----------- `record.c' uiTrieArrayBlockSize_g ====================== Syntax ------ #include "trie.h" /* or opaclib.h */ extern size_t uiTrieArrayBlockSize_g; Description ----------- `Trie' nodes are allocated `uiTrieArrayBlockSize_g' nodes at a time for efficiency. The default value for `uiTrieArrayBlockSize_g' is 2000, which minimizes the number of calls to `allocateMemory', but potentially wastes several thousand bytes of memory. Example ------- #include "strlist.h" #include "trie.h" ... Trie * pLexicon = NULL; StringList * pNewString; ... VOIDP addStringToList(VOIDP pNew_in, VOIDP pList_in) { StringList * pList = pList_in; StringList * pNew = pNew_in; pNew->pNext = pList; return pNew; } ... uiTrieArrayBlockSize_g = 63; /* less time efficient, but more space efficient */ ... pNewString = mergeIntoStringList(NULL, "Test value"); pLexicon = addDataToTrie(pLexicon, pNewString->pszString, pNewString, addStringToList, 3); Source File ----------- `trie.c' The OPAC functions ****************** This chapter gives the proper usage information about each of the functions found in the OPAC function library. For each function that the library provides, this information includes which header files to include in your source to obtain prototypes and type definitions relevent to the use of that function. addDataToTrie ============= Syntax ------ #include "trie.h" /* or opaclib.h */ Trie * addDataToTrie(Trie * pTrieHead_io, const char * pszKey_in, void * pInfo_in, void * (* pfLinkInfo_in)(void * pNew_in, void * pList_io), int iMaxTrieDepth_in); Description ----------- `addDataToTrie' adds information to a trie, using the given insertion key. The arguments to `addDataToTrie' are as follows: `pTrieHead_io' points to the head of a trie. This may be `NULL' the first time `addDataToTrie' is called. Each subsequent call should use the value returned by the preceding call. `pszKey_in' points to the insertion key (a character string). `pInfo_in' points to a generic data structure. The exact definition depends on the application using the `Trie' for data storage and retrieval. `pfLinkInfo_in' points to a function for adding information to the `pTrieInfo' field of the leaf `Trie' data structure found or created for this key. The function has two arguments: `pNew_in' points to a single data item to store. `pList_io' points to a collection of items stored at a `Trie' node (`Trieinfo'), or is `NULL'. The function returns the updated pointer to the data collection for storing as the value of `pTrieInfo'. `iMaxTrieDepth_in' is the maximum depth to which the trie is built. If this is less than the maximum length of key strings, then the data structure stored in the trie must include the key as one of its elements for future reference. Return Value ------------ a pointer to the head of the modified trie Example ------- #include #include #include "trie.h" #include "rpterror.h" #include "allocmem.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next item */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; unsigned long uiLexiconCount_g; static char szWhitespace_m[7] = " \t\r\n\f\v"; ... static void * add_lex_item(void * pNew_in, void * pList_in) { LexItem * pLex; /* * be a little paranoid */ if (pNew_in == NULL) return pList_in; /* * link the list of items that start out the same */ ((LexItem *)pNew_in)->pLink = (LexItem *)pList_in; /* * link the list of homographs */ for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink ) { if (strcmp(((LexItem *)pNew_in)->pszForm, pLex->pszForm) == 0) { ((LexItem *)pNew_in)->pNext = pLex; break; } } return pNew_in; } void load_lexicon(char * pszLexiconFile_in) { FILE * pLexiconFP; char szBuffer[512]; char * pszForm; char * pszGloss; char * pszCategory; LexItem * pLexItem; if (pszLexiconFile_in == NULL) { reportError(WARNING_MSG, "Missing input lexicon filename\n"); return; } pLexiconFP = fopen(pszLexiconFile_in, "r"); if (pLexiconFP == NULL) { reportError(WARNING_MSG, "Cannot open lexicon file %s for input\n", pszLexiconFile_in); return; } while (fgets(szBuffer, 512, pLexiconFP) != NULL) { pszForm = strtok(szBuffer, szWhitespace_m); pszGloss = strtok(NULL, szWhitespace_m); pszCategory = strtok(NULL, szWhitespace_m); if ( (pszForm == NULL) || (pszGloss == NULL) || (pszCategory == NULL) ) continue; pLexItem = (LexItem *)allocateMemory((unsigned)sizeof(LexItem)); pLexItem->pLink = NULL; pLexItem->pNext = NULL; pLexItem->pszForm = duplicateString(pszForm); pLexItem->pszGloss = duplicateString(pszGloss); pLexItem->uiCategory = index_lexical_category(pszCategory); pLexicon_g = addDataToTrie(pLexicon_g, pszForm, pLexItem, add_lex_item, 3); ++uiLexiconCount_g; } fclose(pLexiconFP); } Source File ----------- `trie.c' addLowerUpperWFChars ==================== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ void addLowerUpperWFChars(char * pszLUPairs_in, TextControl * pTextCtl_io); Description ----------- `addLowerUpperWFChars' scans the input string for character pairs. The first member of each pair is added to the set of (multibyte) lowercase alphabetic characters, and the second member is added to the set of (multibyte) uppercase alphabetic characters. Note that there may be a many-to-many mapping between lowercase and uppercase characters. The arguments to `addLowerUpperWFChars' are as follows: `pszLUPairs_in' points to a string containing lowercase/UPPERCASE character pairs. Whitespace characters in the string are ignored. `pTextCtl_io' points to a data structure that contains orthographic information, including the mappings between lowercase and uppercase letters. Return Value ------------ none Example ------- #include "textctl.h" ... TextControl sTextInputCtl_m; ... void set_alphabetic(pszField_in) char * pszField_in; { int code; char * psz; psz = pszField_in; code = *psz++; switch (code) { case 'A': /* alphabetic (word formation) characters */ addWordFormationChars(psz, &sTextInputCtl_m); break; case 'L': /* lower-upper word formation characters */ addLowerUpperWFChars(psz, &sTextInputCtl_m); break; case 'a': /* multibyte alphabetic (word formation) characters */ addWordFormationCharStrings(psz, &sTextInputCtl_m); break; case 'l': /* multibyte lower-upper word formation characters */ addLowerUpperWFCharStrings(psz, &sTextInputCtl_m); break; default: break; } } void reset_alphabetic() { resetWordFormationChars(&sTextInputCtl_m); } Source File ----------- `myctype.c' addLowerUpperWFCharStrings ========================== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ void addLowerUpperWFCharStrings(char * pszLUPairs_in, TextControl * pTextCtl_io); Description ----------- `addLowerUpperWFCharStrings' scans the input string for pairs of multibyte characters. The first member of each pair is added to the set of multibyte lowercase alphabetic characters, and the second member is added to the set of multibyte uppercase alphabetic characters. Note that there may be a many-to-many mapping between lowercase and uppercase multibyte characters. The arguments to `addLowerUpperWFChars' are as follows: `pszLUPairs_in' points to a string containing multibyte lowercase/UPPERCASE character pairs. Whitespace is used to separate the multibyte characters from each other, and the pairs from each other. `pTextCtl_io' points to a data structure that contains orthographic information, including the mappings between lowercase and uppercase letters. Return Value ------------ none Example ------- See the example for `addLowerUpperWFChars' above. Source File ----------- `myctype.c' addStringClass ============== Syntax ------ #include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ StringClass * addStringClass(char * pszField_in, StringClass * pClasses_io); Description ----------- `addStringClass' adds a string class to the list of string classes. String classes are used in string environments such as those in the consistent change notation supported by the OPAC function library. The arguments to `addStringClass' are as follows: `pszField_in' points to a string containing a string class definition: the class name followed by the set of members. `pClasses_io' points to the list of string classes. This may be `NULL' the first time `addStringClass' is called. Each subsequent call should use the value returned by the preceding call. Return Value ------------ a pointer to the head of the updated list of string classes Example ------- #include "change.h" /* includes strclass.h */ ... static Change * pChanges_m = NULL; static StringClass * pClasses_m = NULL; ... void store_change_info(pszField_in) char * pszField_in; { Change * pChg; char * psz; int code; if (pszField_in == NULL) return; psz = pszField_in; code = *psz++; /* grab the table code */ switch (code) { case 'C': /* change */ pChg = parseChangeString( psz, pClasses_m ); if (pChg != (Change *)NULL) { pChg->pNext = pChanges_m; pChanges_m = pChg; } break; case 'S': /* string class */ pClasses_m = addStringClass( psz, pClasses_m ); break; default: break; } } Source File ----------- `strcla.c' addToStringList =============== Syntax ------ #include "strlist.h" StringList * addToStringList(StringList * pList_in, const char * pszString_in); Description ----------- `addToStringList' adds a string to the beginning of a list of strings. It does not check whether the string is already in the list. The arguments to `addToStringList' are as follows: `pList_in' points to a list of strings. It may be `NULL' to signal an empty list. `pszString_in' points to a `NUL'-terminated character string. Return Value ------------ a pointer to the revised list Example ------- #include "strlist.h" ... StringList * pStrings = NULL; ... /* pStrings-->NULL */ pStrings = addToStringList(pStrings, "this"); /* pStrings-->"this"-->NULL */ pStrings = addToStringList(pStrings, "test"); /* pStrings-->"test"-->"this"-->NULL */ pStrings = addToStringList(pStrings, "is"); /* pStrings-->"is"-->"test"-->"this"-->NULL */ pStrings = addToStringList(pStrings, "a"); /* pStrings-->"a"-->"is"-->"test"-->"this"-->NULL */ pStrings = addToStringList(pStrings, "test"); /* pStrings-->"test"-->"a"-->"is"-->"test"-->"this"-->NULL */ Source File ----------- `add_sl.c' addWordFormationChars ===================== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ void addWordFormationChars(char * pszLetters_in, TextControl * pTextCtl_io); Description ----------- `addWordFormationChars' scans the input string for non-whitespace characters. Each such character is added to the set of alphabetic characters that do not have a lowercase/UPPERCASE distinction. (An English example would be the apostrophe character.) The arguments to `addWordFormationChars' are as follows: `pszLetters_in' points to a string containing (caseless) alphabetic characters. `pTextCtl_io' points to a data structure that contains orthographic information. Return Value ------------ none Example ------- See the example for `addLowerUpperWFChars' above. Source File ----------- `myctype.c' addWordFormationCharStrings =========================== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ void addWordFormationCharStrings(char * pszLetters_in, TextControl * pTextCtl_io); Description ----------- `addWordFormationCharStrings' scans the input string for multibyte characters. Each such multibyte character sequence is added to the set of multibyte caseless alphabetic characters. The arguments to `addWordFormationCharStrings' are as follows: `pszLetters_in' points to a string containing multibyte (caseless) alphabetic characters. Whitespace separates the multibyte characters. `pTextCtl_io' points to a data structure that contains orthographic information. Return Value ------------ none Example ------- See the example for `addLowerUpperWFChars' above. Source File ----------- `myctype.c' allocMemory =========== Syntax ------ #include "allocmem.h" /* or opaclib.h */ void * allocMemory(size_t uiSize_in); Description ----------- `allocMemory' provides a "safe" interface to `malloc'. If the requested memory cannot be allocated, the function pointed to by `pfOutOfMemory_g' is called. If `pfOutOfMemory_g' is `NULL', then the default behavior is to display an error message incorporating the string stored in `szOutOfMemoryMarker_g' and abort the program. It is assumed that `allocMemory' always returns a good value. This implies that any function pointed to by `pfOutOfMemory_g' either aborts the program or uses `longjmp' to escape to a safe place in the program. `allocMemory' has a single argument: `uiSize_in' is the number of bytes to allocate. Return Value ------------ a pointer to the beginning of the memory area allocated Example ------- #include "allocmem.h" ... char * p; ... p = allocMemory(75); Source File ----------- `allocmem.c' applyChanges ============ Syntax ------ #include "change.h" /* or textctl.h or template.h or opaclib.h */ char * applyChanges(const char * pszString_in, const Change * pChangeList_in); Description ----------- `applyChanges' applies a list of consistent changes to a string. The function steps through the list of changes, applying each change as often as necessary before trying the next change in the list. The input string is not changed; rather, a copy is created, modified, and returned. The arguments to `applyChanges' are as follows: `pszString_in' points to a string to be changed. `pChangeList_in' points to a list of changes to apply to the string. Return Value ------------ a pointer to a dynamically allocated and (possibly) changed string Example ------- #include "change.h" ... Change * pChanges_m; ... char * pszChanged; ... pszChanged = applyChanges("this is a test", pChanges_m); ... freeMemory( pszChanged ); Source File ----------- `change.c' buildAdjustedFilename ===================== Syntax ------ #include "opaclib.h" char * buildAdjustedFilename(const char * pszFilename_in, const char * pszBasePathname_in, const char * pszExtension_in); Description ----------- `buildAdjustedFilename' builds a filename from the pieces given. If the base pathname contains directory information, and the input filename is not an absolute pathname, the leading directory information is added to the output filename. If the extension is given, and the input filename does not have an extension, the extension is added to the output filename if the file cannot be opened for input without it. The arguments to `buildAdjustedFilename' are as follows: `pszFilename_in' points to a filename string. `pszBasePathname_in' points to a base file pathname string, or is `NULL'. `pszExtension_in' points to a filename extension string, or is `NULL'. Return Value ------------ a pointer to a dynamically allocated filename string Example ------- #include "opaclib.h" ... int readControlFile(char * pszControlFile_in) { char * pszIncludeFile; char szBuffer[512]; FILE * pControlFP; char * p; pControlFP = fopen(pszControlFile_in, "r"); if (pControlFP == NULL) return 0; while (fgets(szBuffer, 512, pControlFP) != NULL) { p = szBuffer + strlen(szBuffer) - 1; if (*p == '\n') *p = '\0'; if (strncmp(szBuffer, "\\include", 8) == 0) { pszIncludeFile = szBuffer + 8; pszIncludeFile += strspn(pszIncludeFile, " \t\r\n\f"); if (*pszIncludeFile == '\0') continue; pszIncludeFile = buildAdjustedFilename(pszIncludeFile, pszControlFile_in, ".ctl"); readControlFile(pszIncludeFile); freeMemory(pszIncludeFile); } ... } fclose(pControlFP); return 1; } Source File ----------- `adjfname.c' buildChangeString ================= Syntax ------ #include "change.h" /* or textctl.h or template.h or opaclib.h */ char * buildChangeString(const Change * pChange_in); Description ----------- `buildChangeString' builds a textual representation of the given consistent change data structure. `buildChangeString' has one argument: `pChange_in' points to a single consistent change data structure. (The `pNext' field of the `Change' data structure is ignored.) Return Value ------------ a pointer to a dynamically allocated string representing the change, or `NULL' if an error occurs Example ------- #include "change.h" ... void displayChangeList(Change * pChanges_in) { Change * pChange; char * pszChange; for ( pChange = pChanges_in ; pChange ; pChange = pChange->pNext ) { pszChange = buildChangeString( pChange ); fprintf(stderr, "%s\n", pszChange); freeMemory( pszChange ); } } Source File ----------- `change.c' checkFileError ============== Syntax ------ #include #include "opaclib.h" void checkFileError(FILE * pOutputFP_in, const char * pszProcessName_in, const char * pszFilename_in); Description ----------- `checkFileError' checks for an error in the output file `pOutputFP_in' whose name is given by `pszFilename_in'. If an error occurred, the output file is deleted and the program exits with an error message. The arguments to `checkFileError' are as follows: `pOutputFP_in' is an output FILE pointer. `pszProcessName_in' points to a string indicating where the error occurred. `pszFilename_in' points to the name of the output file. Return Value ------------ none Example ------- #include #include "cportlib.h" ... FILE * fp; char filename[100]; ... checkFileError(fp, "Program Name", filename); fclose(fp); Source File ----------- `fulldisk.c' cleanupAfterStdFormatRecord =========================== Syntax ------ #include "record.h" /* or opaclib.h */ void cleanupAfterStdFormatRecord(void); Description ----------- `cleanupAfterStdFormatRecord' frees any memory allocated for `readStdFormatRecord'. `cleanupAfterStdFormatRecord' does not have any arguments. Return Value ------------ none Example ------- #include #include "record.h" static CodeTable sLexTable_m = { "\\w\0W\0\\c\0C\\f\0F\\g\0G\0", 4, "\\w" }; ... int load_lexicon(pszLexiconFile_in, cComment_in) char * pszLexiconFile_in; int cComment_in; { FILE * fp; unsigned uiRecordCount = 0; char * pRecord; /* * open the lexicon file */ if (pszLexiconFile_in == NULL) return( 0 ); fp = fopen(pszLexiconFile_in, "r"); if (fp == (FILE *)NULL) return( 0 ); /* * load all the records from the lexicon file */ uiRecordCount = 0; while ((pRecord = readStdFormatRecord(fp, &sLexTable_m, cComment_in, &uiRecordCount)) != NULL) { ... } /* * close the lexicon file and erase the temporary data structures */ fclose(fp); cleanupAfterStdFormatRecord(); return( 1 ); } Source File ----------- `record.c' convLowerToUpper ================ Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ const unsigned char * convLowerToUpper(const unsigned char * pszString_in, const TextControl * pTextCtl_in); Description ----------- `convLowerToUpper' checks whether the input string begins with a multibyte lowercase character. If so, it returns the (first) corresponding multibyte uppercase character. This function depends on previous calls to `addLowerUpperWFChars' or `addLowerUpperWFCharStrings' to establish the mappings between lowercase and uppercase multibyte characters. (`addLowerUpperWFChars' and `addLowerUpperWFCharStrings' are implicitly called by `loadIntxCtlFile' and `loadOutxCtlFile'.) The arguments to `convLowerToUpper' are as follows: `pszString_in' points to a `NUL'-terminated character string. `pTextCtl_in' points to a data structure that contains orthographic information, including the mappings between lowercase and uppercase letters. Return Value ------------ a pointer to a `NUL'-terminated string containing the (primary) corresponding multibyte uppercase character, or `NULL' if the input string does not begin with a multibyte lowercase character. This may point to a static buffer that may be overwritten by the next call to `convLowerToUpper'. Example ------- #include "textctl.h" ... static TextControl sTextCtl_m; static StringClass * pStringClasses_m; static char szOutxFilename_m[100]; ... loadOutxCtlFile(szOutxFilename_m, ';', &sTextCtl_m, &pStringClasses_m); ... unsigned char * upcaseString(unsigned char * pszString_in) { size_t iCharSize; size_t iUCSize; size_t iUpperLength; unsigned char * p; unsigned char * pUC; unsigned char * pszUpper; unsigned char * q; if (pszString_in == NULL) return NULL; for ( p = pszString_in ; *p ; p += iCharSize ) { if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0) iCharSize = 1; pUC = convLowerToUpper(p, &sTextCtl_m); if (pUC != NULL) iUpperLength += strlen((char *)pUC); else iUpperLength += iCharSize; } pszUpper = allocMemory(iUpperLength + 1); for ( p = pszString_in, q = pszUpper ; *p ; p += iCharSize ) { if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0) iCharSize = 1; pUC = convLowerToUpper(p, &sTextCtl_m); if (pUC != NULL) { iUCSize = strlen((char *)pUC); memcpy(q, pUC, iUCSize); q += iUCSize; } else { memcpy(q, p, iCharSize); q += iCharSize; } } pszUpper[iUpperLength] = NUL; return pszUpper; } Source File ----------- `myctype.c' convLowerToUpperSet =================== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ const StringList * convLowerToUpperSet(const unsigned char * pszString_in, const TextControl * pTextCtl_in); Description ----------- `convLowerToUpperSet' checks whether the input string begins with a multibyte lowercase character. If so, it returns the complete set of corresponding multibyte uppercase characters. This function depends on previous calls to `addLowerUpperWFChars' or `addLowerUpperWFCharStrings' to establish the mappings between lowercase and uppercase multibyte characters. (`addLowerUpperWFChars' and `addLowerUpperWFCharStrings' are implicitly called by `loadIntxCtlFile' and `loadOutxCtlFile'.) The arguments to `convLowerToUpperSet' are as follows: `pszString_in' points to a `NUL'-terminated character string. `pTextCtl_in' points to a data structure that contains orthographic information, including the mappings between lowercase and uppercase letters. Return Value ------------ a pointer to a list of `NUL'-terminated strings containing the corresponding multibyte uppercase characters, or `NULL' if the input string does not begin with a multibyte lowercase character. This may point to a static buffer that may be overwritten by the next call to `convLowerToUpperSet'. Example ------- #include "textctl.h" #include "rpterror.h" ... StringList * upcaseWord(pszWord_in, pTextCtl_in) char * pszWord_in; const TextControl * pTextCtl_in; { size_t uiCharCount; size_t uiLowerCount; size_t uiNumberAlternatives; size_t uiSpan; size_t uiWordLength; size_t k; int iLength; unsigned char * p; StringList * pUpcaseList = NULL; const StringList * pUpperSet; const StringList * ps; /* * count the number of multibyte characters in the string * count the lowercase letters * calculate the number of (ambiguous) upcase conversions * calculate the maximum length of the upcased word */ uiCharCount = 0; uiLowerCount = 0; uiNumberAlternatives = 1; uiWordLength = 1; /* count the terminating NUL byte */ for ( p = (unsigned char *)pszWord_in ; *p != NUL ; p += iLength ) { iLength = matchAlphaChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; ++uiCharCount; if (matchLowercaseChar(p, pTextCtl_in) != 0) { ++uiLowerCount; pUpperSet = convLowerToUpperSet(p, pTextCtl_in); uiNumberAlternatives *= getStringListSize( pUpperSet ); uiSpan = 0; for ( ps = pUpperSet ; ps ; ps = ps->pNext ) { k = strlen( ps->pszString ); if (k > uiSpan) uiSpan = k; } } else uiSpan = iLength; uiWordLength += uiSpan; } if (uiLowerCount == 0) { /* * the word is already all uppercase */ return addToStringList(NULL, pszWord_in); } else { /* * convert word to all uppercase (possibly ambiguosly) */ char * pszCapWord; char * pszUpper; size_t uiNum; int iUpperLength; size_t i; size_t j; if (uiNumberAlternatives < 1) { reportError(ERROR_MSG, "error getting uppercase equivalents for \"%s\"\n", pszWord_in); return NULL; } if (uiNumberAlternatives > 500) { reportError(WARNING_MSG, "%lu uppercase equivalents is too many: storing only 500\n", uiNumberAlternatives); uiNumberAlternatives = 500; } pszCapWord = allocMemory(uiWordLength); for ( i = 0 ; i < uiNumberAlternatives ; ++i ) { strcpy(pszCapWord, pszWord_in); uiSpan = 1; j = 0; for ( p = (unsigned char *)pszCapWord ; *p ; p += iLength ) { iLength = matchLowercaseChar(p, pTextCtl_in); if (iLength != 0) { pUpperSet = convLowerToUpperSet(p, pTextCtl_in); uiNum = getStringListSize(pUpperSet); pszUpper = pUpperSet->pszString; if (uiNum > 1) { k = (i / uiSpan) % uiNum; uiSpan *= uiNum; for ( ps = pUpperSet ; ps ; ps = ps->pNext ) { if (k == 0) { pszUpper = ps->pszString; break; } -k; } } /* * replace the lowercase multibyte character with an * equivalent uppercase multibyte character */ iUpperLength = strlen(pszUpper); if (iUpperLength != iLength) memmove(p + iUpperLength, p + iLength, strlen((char *)p + iLength) + 1); memcpy(p, pszUpper, iUpperLength); iLength = iUpperLength; } else { iLength = matchAlphaChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; } ++j; } pUpcaseList = addToStringList(pUpcaseList, pszCapWord); } freeMemory( pszCapWord ); } return pUpcaseList; } Source File ----------- `myctype.c' convUpperToLower ================ Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ const unsigned char * convUpperToLower(const unsigned char * pszString_in, const TextControl * pTextCtl_in); Description ----------- `convUpperToLower' checks whether the input string begins with a multibyte uppercase character. If so, it returns the (first) corresponding multibyte lowercase character. This function depends on previous calls to `addLowerUpperWFChars' or `addLowerUpperWFCharStrings' to establish the mappings between lowercase and uppercase multibyte characters. (`addLowerUpperWFChars' and `addLowerUpperWFCharStrings' are implicitly called by `loadIntxCtlFile' and `loadOutxCtlFile'.) The arguments to `convUpperToLower' are as follows: `pszString_in' points to a `NUL'-terminated character string. `pTextCtl_in' points to a data structure that contains orthographic information, including the mappings between lowercase and uppercase letters. Return Value ------------ a pointer to a `NUL'-terminated string containing the (primary) corresponding multibyte lowercase character, or `NULL' if the input string does not begin with a multibyte uppercase character. This may point to a static buffer that may be overwritten by the next call to `convUpperToLower'. Example ------- #include "textctl.h" ... static TextControl sTextCtl_m; static StringClass * pStringClasses_m; static char szIntxFilename_m[100]; ... loadIntxCtlFile(szIntxFilename_m, ';', &sTextCtl_m, &pStringClasses_m); ... unsigned char * downcaseString(unsigned char * pszString_in) { size_t iCharSize; size_t iLCSize; size_t iLowerLength; unsigned char * p; unsigned char * pLC; unsigned char * pszLower; unsigned char * q; if (pszString_in == NULL) return NULL; for ( p = pszString_in ; *p ; p += iCharSize ) { if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0) iCharSize = 1; pLC = convUpperToLower(p, &sTextCtl_m); if (pLC != NULL) iLowerLength += strlen((char *)pLC); else iLowerLength += iCharSize; } pszLower = allocMemory(iLowerLength + 1); for ( p = pszString_in, q = pszLower ; *p ; p += iCharSize ) { if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0) iCharSize = 1; pLC = convUpperToLower(p, &sTextCtl_m); if (pLC != NULL) { iLCSize = strlen((char *)pLC); memcpy(q, pLC, iLCSize); q += iLCSize; } else { memcpy(q, p, iCharSize); q += iCharSize; } } pszLower[iLowerLength] = NUL; return pszLower; } Source File ----------- `myctype.c' convUpperToLowerSet =================== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ const StringList * convUpperToLowerSet(const unsigned char * pszString_in, const TextControl * pTextCtl_in); Description ----------- `convUpperToLowerSet' checks whether the input string begins with a multibyte uppercase character. If so, it returns the complete set of corresponding multibyte lowercase characters. This function depends on previous calls to `addLowerUpperWFChars' or `addLowerUpperWFCharStrings' to establish the mappings between lowercase and uppercase multibyte characters. (`addLowerUpperWFChars' and `addLowerUpperWFCharStrings' are implicitly called by `loadIntxCtlFile' and `loadOutxCtlFile'.) The arguments to `convUpperToLowerSet' are as follows: `pszString_in' points to a `NUL'-terminated character string. `pTextCtl_in' points to a data structure that contains orthographic information, including the mappings between lowercase and uppercase letters. Return Value ------------ a pointer to a list of `NUL'-terminated strings containing the corresponding multibyte lowercase characters, or `NULL' if the input string does not begin with a multibyte uppercase character. This may point to a static buffer that may be overwritten by the next call to `convUpperToLowerSet'. Example ------- #include "textctl.h" #include "rpterror.h" ... StringList * downcaseWord(pszWord_in, pTextCtl_in) char * pszWord_in; const TextControl * pTextCtl_in; { size_t uiCharCount; size_t uiUpperCount; size_t uiNumberAlternatives; size_t uiSpan; size_t uiWordLength; size_t k; int iLength; unsigned char * p; StringList * pDowncaseList = NULL; const StringList * pLowerSet; const StringList * ps; /* * count the number of multibyte characters in the string * count the uppercase letters * calculate the number of (ambiguous) downcase conversions * calculate the maximum length of the downcased word */ uiCharCount = 0; uiUpperCount = 0; uiNumberAlternatives = 1; uiWordLength = 1; /* count the terminating NUL byte */ for ( p = (unsigned char *)pszWord_in ; *p != NUL ; p += iLength ) { iLength = matchAlphaChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; ++uiCharCount; if (matchUppercaseChar(p, pTextCtl_in) != 0) { ++uiUpperCount; pLowerSet = convUpperToLowerSet(p, pTextCtl_in); uiNumberAlternatives *= getStringListSize( pLowerSet ); uiSpan = 0; for ( ps = pLowerSet ; ps ; ps = ps->pNext ) { k = strlen( ps->pszString ); if (k > uiSpan) uiSpan = k; } } else uiSpan = iLength; uiWordLength += uiSpan; } if (uiUpperCount == 0) { /* * the word is already all lowercase */ return addToStringList(NULL, pszWord_in); } else { /* * convert word to all lowercase (possibly ambiguosly) */ char * pszDecapWord; char * pszLower; size_t uiNum; int iLowerLength; size_t i; size_t j; if (uiNumberAlternatives < 1) { reportError(ERROR_MSG, "error getting lowercase equivalents for \"%s\"\n", pszWord_in); return NULL; } if (uiNumberAlternatives > 500) { reportError(WARNING_MSG, "%lu lowercase equivalents is too many: storing only 500\n", uiNumberAlternatives); uiNumberAlternatives = 500; } pszDecapWord = allocMemory(uiWordLength); for ( i = 0 ; i < uiNumberAlternatives ; ++i ) { strcpy(pszDecapWord, pszWord_in); uiSpan = 1; j = 0; for ( p = (unsigned char *)pszDecapWord ; *p ; p += iLength ) { iLength = matchUppercaseChar(p, pTextCtl_in); if (iLength != 0) { pLowerSet = convUpperToLowerSet(p, pTextCtl_in); uiNum = getStringListSize(pLowerSet); pszLower = pLowerSet->pszString; if (uiNum > 1) { k = (i / uiSpan) % uiNum; uiSpan *= uiNum; for ( ps = pLowerSet ; ps ; ps = ps->pNext ) { if (k == 0) { pszLower = ps->pszString; break; } --k; } } /* * replace the uppercase multibyte character with an * equivalent lowercase multibyte character */ iLowerLength = strlen(pszLower); if (iLowerLength != iLength) memmove(p + iLowerLength, p + iLength, strlen((char *)p + iLength) + 1); memcpy(p, pszLower, iLowerLength); iLength = iLowerLength; } else { iLength = matchAlphaChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; } ++j; } pDowncaseList = addToStringList(pDowncaseList, pszDecapWord); } freeMemory( pszDecapWord ); } return pDowncaseList; } Source File ----------- `myctype.c' decapitalizeWord ================ Syntax ------ #include "template.h" /* or opaclib.h */ int decapitalizeWord(WordTemplate * pWord_io, const TextControl * pTextCtl_in); Description ----------- int (pWord_io, pTextCtl_in) WordTemplate * pWord_io; /* pointer to WordTemplate structure TextControl * pTextCtl_in; `decapitalizeWord' converts the input word to all lowercase (possibly ambiguously) and returns a capitalization flag: `0 (NOCAP)' The input word had no uppercase letters. `1 (INITCAP)' The input word had a single capital letter at the beginning. `2 (ALLCAP)' The input word had all uppercase letters. `>4' The input word had a mixture of uppercase and lowercase letters. After the conversion to all lowercase, any orthography changes stored in `pTextCtl_in' are applied. The arguments to `decapitalizeWord' are as follows: `pWord_io' points to a data structure that contains the original word and receives the decapitalized word. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ the capitalization flag for the word Example ------- #include "template.h" /* includes textctl.h */ ... WordTemplate * buildTemplate( char * pszWord_in, TextControl * pTextCtl_in) { WordTemplate * pTemplate; if (pszWord_in == NULL) return NULL; pTemplate = (WordTemplate *)allocMemory(sizeof(WordTemplate)); pTemplate->pszOrigWord = duplicateString( pszWord_in ); pTemplate->iCapital = decapitalizeWord( pTemplate, pTextCtl_in); return pTemplate; } Source File ----------- `textin.c' displayNumberedMessage ====================== Syntax ------ #include "rpterror.h" /* or opaclib.h */ void displayNumberedMessage(const NumberedMessage * pMessage_in, int bSilent_in, int bShowWarnings_in, FILE * pLogFP_in, const char * pszFilename_in, unsigned uiLineNumber_in, ...); Description ----------- `displayNumberedMessage' writes a numbered error or warning message to the standard error output (screen), optionally writing it to a log file as well. For GUI programs, the programmer must write a different version of `displayNumberedMessage' to satisfy the link requirements of other functions in the OPAC library. This would typically display a message box or write to a message window. The arguments to `displayNumberedMessage' are as follows: `pMessage_in' points to a `NumberedMessage' data structure that contains the message type, the message number, and the format string for the message. `bSilent_in' specifies that no screen output occurs if `TRUE' (nonzero). `bShowWarnings_in' specifies that warning messages (not just error messages) are displayed if `TRUE' (nonzero). `pLogFP_in' is a `FILE' pointer to an open log file, or is `NULL'. `pszFilename_in' points to the name of the input file in which the error occurred, or is `NULL'. `uiLineNumber_in' is the line number in the input file on which the error occurred, or is zero (`0'). `...' represents any number of additional arguments needed by the `printf' style format string given by `pMessage_in'. Return Value ------------ none Example ------- #include #include "opaclib.h" /* includes rpterror.h */ ... int bSilent_g = 0; int bShowWarnings_g = 1; FILE * pLogFP_g = NULL; ... static NumberedMessage sCannotOpen_m = { ERROR_MSG, 100, "Cannot open %s file %s" }; static NumberedMessage sIgnoreRedundant_m = { WARNING_MSG, 101, "Ignoring all but first \\%s line" }; static char * aszCodes_m[] = { "\\lexicon", "\\grammar", ... NULL }; ... FILE * pControlFP; char * pszControlFile; unsigned uiLineNumber; char * pszLexFile; char ** ppszField; char * p; unsigned i; ... pControlFP = fopen(pszControlFile, "r"); if (pControlFP == (FILE *)NULL) { displayNumberedMessage(&sCannotOpen_m, bSilent_g, bShowWarnings_g, pLogFP_g, NULL, 0, "log", pszControlFile); exit(1); } uiLineNumber = 1; while ((ppszField = readStdFormatField(pControlFP, aszCodes_m, NUL)) != NULL) { switch (**ppszField) { case 1: /* "\\lexicon" */ if (pszLexFile != (char *)NULL) displayNumberedMessage(&sIgnoreRedundant_m, bSilent_g, bShowWarnings_g, pLogFP_g, pszControlFile, uiLineNumber, "lexicon"); else { p = strtok(ppszField[0]+1, " \t\r\n\f\v"); pszLexFile = buildAdjustedFilename(p, pszControlFile, ".lex"); } break; ... } ... for ( i = 0 ; ppszField[i] ; ++i ) ++uiLineNumber; } ... Source File ----------- `textin.c' duplicateString =============== Syntax ------ #include "allocmem.h" /* or opaclib.h */ char * duplicateString(const char * pszString_in); Description ----------- `duplicateString' creates a copy of an existing `NUL'-terminated character string. It calls `allocateMemory' to get the memory to store the copy of the string. If `pszString_in' is `NULL', then `duplicateString' returns `NULL'. This is the same as the standard function `strdup', except that it calls `allocateMemory' instead of `malloc'. `duplicateString' has one argument: `pszString_in' points to a `NUL'-terminated character string. Return Value ------------ a pointer to the newly allocated and copied duplicate string Example ------- #include "template.h" /* includes textctl.h */ ... WordTemplate * buildTemplate( char * pszWord_in, TextControl * pTextCtl_in) { WordTemplate * pTemplate; if (pszWord_in == NULL) return NULL; pTemplate = (WordTemplate *)allocMemory(sizeof(WordTemplate)); pTemplate->pszOrigWord = duplicateString( pszWord_in ); pTemplate->iCapital = decapitalizeWord( pTemplate, pTextCtl_in); return pTemplate; } Source File ----------- `allocmem.c' duplicateStringList =================== Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * duplicateStringList(const StringList * pList_in); Description ----------- `duplicateStringList' copies a list of strings to create another, identical list of strings. If `pList_in' is `NULL', then `duplicateStringList' returns `NULL'. `duplicateStringList' has one argument: `pList_io' points to a list of strings. Return Value ------------ a pointer to the new list of dynamically allocated strings Example ------- #include "strlist.h" ... StringList * pList1; StringList * pList2; ... pList2 = duplicateStringList(pList1); ... freeStringList( pList2 ); pList2 = NULL; Source File ----------- `copy_sl.c' equivalentStringLists ===================== Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ int equivalentStringLists(const StringList * pFirst_in, const StringList * pSecond_in); Description ----------- `equivalentStringLists' tests whether or not two string lists contain the same strings. The strings do not have to be in the same order in the two lists. Duplicate strings in either list are immaterial. The arguments to `equivalentStringLists' are as follows: `pFirst_in' points to a list of strings. `pSecond_in' points to another list of strings. Return Value ------------ nonzero (TRUE) if the lists are equal, otherwise zero (FALSE) Example ------- #include "strlist.h" ... StringList * pList1; StringList * pList2; ... if (equivalentStringLists(pList1, pList2)) { ... } Source File ----------- `equiv_sl.c' eraseCharsInString ================== Syntax ------ #include "opaclib.h" char * eraseCharsInString(char * pszString_io, const char * pszEraseChars_in); Description ----------- `eraseCharsInString' erases any characters from `pszEraseChars_in' that are found in `pszString_io', possibly shortening `pszString_io' as a side-effect. The arguments to `eraseCharsInString' are as follows: `pszString_io' points to the input (and output) string. `pszEraseChars_in' points to the characters to erase from the input string. Return Value ------------ a pointer to the possibly modified string Example ------- #include "opaclib.h" /* includes allocmem.h */ ... static char szMarkers_m[] = "-=#"; ... static int get_score(pszMarkedWord_in) const char * pszMarkedWord_in; { char * pszWord; int iScore = 0; if (pszMarkedWord_in != NULL) { pszWord = eraseCharsInString(duplicateString(pszMarkedWord_in), szMarkers_m); ... freeMemory(pszWord); } return iScore; } Source File ----------- `erasecha.c' eraseTrie ========= Syntax ------ #include "trie.h" /* or opaclib.h */ void eraseTrie(Trie * pTrieHead_io, void (* pfEraseInfo_in)(void * pList_io)); Description ----------- `eraseTrie' walks through a trie, freeing all the memory allocated for the trie and for the information it stores. The arguments to `eraseTrie' are as follows: `pTrieHead_io' points to the head of a trie. `pfEraseInfo_in' points to a function for erasing the stored information. The function has one argument: `pList_io' points to a data collection to erase, presumably by freeing memory. The function does not return a value. Return Value ------------ none Example ------- #include "trie.h" #include "allocmem.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next element */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; unsigned long uiLexiconCount_g; ... static void erase_lex_item(void * pList) { LexItem * pItem; LexItem * pNextItem; for ( pItem = (LexItem *)pList ; pItem ; pItem = pNextItem ) { pNextItem = pItem->pLink; if (pItem->pszForm != NULL) freeMemory(pItem->pszForm); if (pItem->pszGloss != NULL) freeMemory(pItem->pszGloss); freeMemory(pItem); } } void free_lexicon() { if (pLexicon_g != NULL) { eraseTrie(pLexicon_g, erase_lex_item); pLexicon_g = NULL; } uiLexiconCount_g = 0L; } Source File ----------- `trie.c' exitSafely ========== Syntax ------ #include "opaclib.h" int exitSafely(int iCode_in); Description ----------- `exitSafely' replaces `exit'. When compiled for Microsoft Windows, the program should define `exitSafely' to not call `exit' because Windows doesn't like that very much! `exitSafely' has one argument: `iCode_in' is the program status code to return from the program. Return Value ------------ none, but it must be defined as returning int to keep everyone happy Example ------- #include #include "opaclib.h" ... char * pszCopy; ... pszCopy = strdup("This is a test!"); if (pszCopy == NULL) { ... exitSafely(2); } Source File ----------- `safeexit.c' fcloseWithErrorCheck ==================== Syntax ------ #include "opaclib.h" void fcloseWithErrorCheck(FILE * pOutputFP_in, const char * pszFilename_in); Description ----------- `fcloseWithErrorCheck' checks for the output file for write errors, and closes it. If an error is detected, it is reported using `reportError'. The arguments to `fcloseWithErrorCheck' are as follows: `pOutputFP_in' is an output FILE pointer. `pszFilename_in' points to the name of the output file. Return Value ------------ none Example ------- #include #include "opaclib.h" ... FILE * pOutput; char * pszFilename; ... pOutput = fopen(pszFilename, "w"); if (pOutput != NULL) { ... fcloseWithErrorCheck(pOutput, pszFilename); pOutput = NULL; } Source File ----------- `errcheck.c' findDataInTrie ============== Syntax ------ #include "trie.h" /* or opaclib.h */ void * findDataInTrie(const Trie * pTrieHead_in, const char * pszKey_in); Description ----------- `findDataInTrie' searches the trie for information stored using the key for access. The pointer returned is not guaranteed to point to only desired information unless the length of the key is less than the maximum depth of the trie. You may need to scan over the list (or array) to get exactly what you want. The arguments to `findDataInTrie' are as follows: `pTrieHead_in' points to the head of a trie. `pszKey_in' points to the key string. Return Value ------------ a pointer to the generic information found in the trie, or `NULL' if the search fails Example ------- #include "trie.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next element */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; ... LexItem * find_entries(unsigned char * pszWord_in) { LexItem * pLex; for ( pLex = findDataInTrie(pLexicon_g, pszWord_in) ; pLex ; pLex = pLex->pLink ) { if (strcmp(pLex->pszForm, pszWord_in) == 0) { /* * since add_lex_item() links the homographs together, * this points to a list containing only the homographs */ return pLex; } } return NULL; } Source File ----------- `trie.c' findStringClass =============== Syntax ------ #include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ StringClass * findStringClass(const char * pszName_in, const StringClass * pClasses_in); Description ----------- `findStringClass' searches a list of string classes for a specific string class by name. The arguments to `findStringClass' are as follows: `pszName_in' points to the name of the desired string class. `pClasses_in' points to a collection of string classes to search. Return Value ------------ a pointer to the string class found, or `NULL' if not found Example ------- #include "strclass.h" #include "rpterror.h" ... static StringClass * pClasses_m = NULL; ... StringClass * pClass; char * pszClassName; ... pClass = findStringClass( pszClassName, pClasses_m); if (pClass == NULL) reportError(WARNING_MSG, "Undefined class %s\n", pszName); ... Source File ----------- `strcla.c' fitAllocStringExactly ===================== Syntax ------ #include "allocmem.h" /* or opaclib.h */ char * fitAllocStringExactly(char * pszString_in); Description ----------- `fitAllocStringExactly' shrinks the allocated buffer to exactly fit the string. The program is aborted with an error message if it somehow runs out of memory. (See `allocMemory' above for details about this error message.) `fitAllocStringExactly' has one argument: `pszString_in' points to a string in a possibly overlarge allocated buffer. Return Value ------------ a pointer to the (possibly) reallocated block Example ------- #include #include "allocmem.h" ... char * read_line(FILE * pInputFP_in) { char * pszBuffer; size_t uiBufferSize = 500; size_t uiLineLength; if ((pInputFP_in == NULL) || feof(pInputFP_in)) return NULL; pszBuffer = allocMemory(uiBufferSize); if (fgets(pszBuffer, uiBufferSize, pInputFP_in) == NULL) { freeMemory(pszBuffer); return NULL; } while (strchr(pszBuffer, '\n') == NULL) { uiBufferSize += 500; pszBuffer = reallocMemory(pszBuffer, uiBufferSize); uiLineLength = strlen(pszBuffer); if (fgets(pszBuffer + uiLineLength, uiBufferSize - uiLineLength, pInputFP_in) == NULL) break; } return fitAllocStringExactly( pszBuffer ); } Source File ----------- `allocmem.c' fixSynthesizedWord ================== Syntax ------ #include "opaclib.h" void fixSynthesizedWord(WordTemplate * pTemplate_io, const TextControl * pTextCtl_in); Description ----------- `fixSynthesizedWord' applies the output orthography changes and recapitalization to the list of synthesized wordforms. The list is updated to reflect these changes, and to minimize any ensuing ambiguity. The arguments to `fixSynthesizedWord' are as follows: `pTemplate_io' points to a data structure that contains the (possibly ambiguous) word synthesis list and capitalization information. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ none Example ------- #include "template.h" ... TextControl sTextControl_g; ... FILE * pInputFP; FILE * pOutputFP; WordTemplate * pWord; ... for (;;) { pWord = readTemplateFromAnalysis(pInputFP, &sTextControl_g); if (pWord == NULL) break; pWord->pNewWords = synthesize_word(pWord->pAnalyses, &sTextControl_g); fixSynthesizedWord(pWord, &TextControl_g); writeTextFromTemplate( pOutputFP, pWord, &sTextControl_g); freeWordTemplate( pWord ); } Source File ----------- `textout.c' fopenAlways =========== Syntax ------ #include "opaclib.h" FILE * fopenAlways(char * pszFilename_io, const char * pszMode_in); Description ----------- `fopenAlways' opens a file, prompting the user if necessary and retrying until successful. If it is not `NULL', `pszFilename_io' is updated to contain the name of the file actually opened. `fopenAlways' uses `fopen' to open the file, and repeatedly prompts the user for a filename if `fopen' fails. The buffer pointed to by pszFilename_io must be (at least) `FILENAME_MAX' bytes long. If `FILENAME_MAX' is not defined by `stdio.h', then it is assumed to be 128. `pszFilename_io' points to a buffer for holding the name of the file, or is `NULL'. `pszMode_in' points to an `fopen' mode string (usually `"r"' or `"w"'). Return Value ------------ a valid FILE pointer Example ------- #include #include "opaclib.h" ... FILE * pInputFP; char szFilename[FILENAME_MAX]; ... pInputFP = fopenAlways(szFilename, "r"); ... fclose(pInputFP); pInputFP = NULL; Source File ----------- `ufopen.c' freeChangeList ============== Syntax ------ #include "change.h" /* or textctl.h or template.h or opaclib.h */ void freeChangeList(Change * pList_io); Description ----------- `freeChangeList' frees the memory allocated for a list of consistent change structures. `freeChangeList' has one argument: `pList_io' points to a list of consistent change structures. Return Value ------------ none Example ------- #include "change.h" ... Change * pChangeList_g; ... void add_change(char * pszChange_in) { Change * pTail; if (pChangeList_g == NULL) pChangeList_g = parseChangeString( pszChange_in ); else { for (pTail = pChangeList_g ; pTail->pNext ; pTail = pTail->pNext) ; pTail->pNext = parseChangeString( pszChange_in ); } } ... freeChangeList( pChangeList_g ); pChangeList_g = NULL; Source File ----------- `change.c' freeCodeTable ============= Syntax ------ #include "record.h" void freeCodeTable(CodeTable * pCodeTable_io); Description ----------- `freeCodeTable' frees the memory allocated for a `CodeTable' data structure. `freeCodeTable' has only one argument: `pCodeTable_io' points to a `CodeTable' data structure that contains information that is no longer needed. Return Value ------------ none Example ------- #include "record.h" #include "ample.h" AmpleData sAmpleData_g; char szCodesFilename_g[100]; char szDictFilename_g[100]; ... loadAmpleDictCodeTables(szCodesFilename_g, &sAmpleData_g, FALSE); ... loadAmpleDictionary(szDictFilename_g, PFX, &sAmpleData_g); freeCodeTable( sAmpleData_g.pPrefixTable ); sAmpleData_g.pPrefixTable = NULL; Source File ----------- `free_ct.c' freeMemory ========== Syntax ------ #include "allocmem.h" /* or opaclib.h */ void freeMemory(void * pBlock_io); Description ----------- `freeMemory' provides a "safe" interface to `free'. It ignores `NULL' as an argument. (But passing `NULL' is still poor practice!) This is the only protection added to `free': passing random memory addresses to `freeMemory', or passing the same address twice, will result in memory corruption and program crashes! `freeMemory' has one argument: `pBlock_io' points to a dynamically allocated block of memory to deallocate. Return Value ------------ none Example ------- #include #include "allocmem.h" ... char * read_line(FILE * pInputFP_in) { char * pszBuffer; size_t uiBufferSize = 500; size_t uiLineLength; if ((pInputFP_in == NULL) || feof(pInputFP_in)) return NULL; pszBuffer = allocMemory(uiBufferSize); if (fgets(pszBuffer, uiBufferSize, pInputFP_in) == NULL) { freeMemory(pszBuffer); return NULL; } return pszBuffer; } Source File ----------- `allocmem.c' freeStringClasses ================= Syntax ------ #include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ void freeStringClasses(StringClass * pClasses_io); Description ----------- `freeStringClasses' frees the memory allocated for the list of string classes. `freeStringClasses' has one argument: `pClasses_io' points to a list of string classes. Return Value ------------ none Example ------- #include "change.h" /* includes strclass.h */ ... static Change * pChanges_m; static StringClass * pClasses_m; ... void free_change_info() { freeChangeList( pChanges_m ); freeStringClasses( pClasses_m ); pChanges_m = NULL; pClasses_m = NULL; } Source File ----------- `strcla.c' freeStringList ============== Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ void freeStringList(StringList * pList_io); Description ----------- `freeStringList' deletes a list of strings, freeing all the memory used by the list of strings. `freeStringList' has one argument: `pList_io' points to a list of strings. Return Value ------------ none Example ------- #include "strlist.h" ... StringList * pNames_g; ... freeStringList(pNames_g); pNames_g = NULL; ... Source File ----------- `free_sl.c' freeWordAnalysisList ==================== Syntax ------ #include "template.h" void freeWordAnalysisList(WordAnalysis * pAnalyses_io); Description ----------- `freeWordAnalysisList' frees the memory allocated for a list of `WordAnalysis' data structures. `freeWordAnalysisList' has one argument: `pAnalyses_io' points to a list of `WordAnalysis' data structures. Return Value ------------ none Example ------- #include "template.h" ... WordTemplate * pWord; ... if (pWord->pAnalyses != NULL) freeWordAnalysisList(pWord->pAnalyses); ... Source File ----------- `wordanal.c' freeWordTemplate ================ Syntax ------ #include "template.h" /* or opaclib.h */ void freeWordTemplate(WordTemplate * pWord_io); Description ----------- `freeWordTemplate' frees everything in a `WordTemplate' data structure, including the structure itself. `freeWordTemplate' has one argument: `pWord_io' points to a `WordTemplate' data structure to free. Return Value ------------ none Example ------- #include "template.h" ... TextControl sTextCtl_g; ... WordAnalysis * merge_analyses( WordAnalysis * pList_in, WordAnalysis * pAnal_in) { ... } ... void process( FILE * pInputFP_in, FILE * pOutputFP_in) { WordTemplate * pWord; WordAnalysis * pAnal; unsigned uiAmbiguityCount; unsigned long uiWordCount; for ( uiWordCount = 0L ;; ) { pWord = readTemplateFromText(pInputFP_in, &sTextCtl_g); if (pWord == NULL) break; uiAmbiguityCount = 0; if (pWord->paWord != NULL) { for ( i = 0 ; pWord->paWord[i] ; ++i ) { pAnal = analyze(pWord->paWord[i]); pWord->pAnalyses = merge_analyses(pWord->pAnalyses, pAnal); } for (pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext) ++uiAmbiguityCount; } uiWordCount = showAmbiguousProgress(uiAmbiguityCount, uiWordCount); writeTemplate(pOutputFP_in, NULL, pWord, &sTextCtl_g); freeWordTemplate(pWord); } } Source File ----------- `free_wt.c' getAndClearAllocMemorySum ========================= Syntax ------ #include "allocmem.h" /* or opaclib.h */ unsigned long getAndClearAllocMemorySum(void); Description ----------- `getAndClearAllocMemorySum' returns the amount of memory used by `allocMemory' calls since the last call to `getAndClearAllocMemorySum'. It does not account for calls to `freeMemory', which greatly reduces its accuracy. `getAndClearAllocMemorySum' does not have any arguments. Return Value ------------ the number of bytes of memory requested by `allocMemory' calls since the last call to `getAndClearAllocMemorySum' Example ------- #include #include "allocmem.h" ... getAndClearAllocMemorySum(); /* reset the counter */ ... p = allocMemory(500); ... p = duplicateString("this is a test"); ... printf("%lu bytes allocated recently\n", getAndClearAllocMemorySum()); Source File ----------- `allocmem.c' getChangeQuote ============== Syntax ------ #include "change.h" /* or textctl.h or template.h or opaclib.h */ int getChangeQuote(const char * pszMatch_in, const char * pszReplace_in); Description ----------- `getChangeQuote' finds a suitable "quote" character that is not used in either input string. The arguments to `getChangeQuote' are as follows: `pszMatch_in' points to the string to change from. `pszReplace_in' points to the string to change to. Return Value ------------ a character suitable for quoting the match and replace strings Example ------- #include #include "change.h" #include "allocmem.h" char * composeChangeString(pszMatch_in, pszReplace_in, pszEnvir_in) const char * pszMatch_in; const char * pszReplace_in; const char * pszEnvir_in; { char * pszChange; size_t uiLength; char cQuote; if ((pszMatch_in == NULL) && (pszReplace_in == NULL)) return NULL; if (pszMatch_in == NULL) pszMatch_in = ""; if (pszReplace_in == NULL) pszReplace_in = ""; uiEnvirLength = strlen( pszEnvir_in ); uiLength = strlen( pszMatch_in ) + strlen( pszReplace_in ) + 6; if ((pszEnvir_in != NULL) && (*pszEnvir_in != '\0')) uiLength += strlen( pszEnvir_in ) + 1; pszChange = allocMemory(uiLength); cQuote = getChangeQuote(pszMatch_in, pszReplace_in); sprintf(pszChange, "%c%s%c %c%s%c", cQuote, pszMatch_in, cQuote, cQuote, pszReplace_in, cQuote); if ((pszEnvir_in != NULL) && (*pszEnvir_in != '\0')) strcat(strcat(pszChange, " "), pszEnvir_in); return pszChange; } Source File ----------- `change.c' getStringListSize ================= Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ unsigned getStringListSize(const StringList * pList_in); Description ----------- `getStringListSize' counts the number of strings stored in the list. It does not check for duplicate strings or for `NULL' string pointers, just for the total number of data structures linked together. `getStringListSize' has one argument: `pList_in' points to a list of strings. Return Value ------------ the number of strings in the list Example ------- #include #include "strlist.h" ... void writeAmbigWords(pList_in, cAmbig_in, pOutputFP_in) const StringList * pList_in; int cAmbig_in; FILE * pOutputFP_in; { char szAmbig[2]; if (pList_in == NULL) fprintf(pOutputFP_in, "%c0%c%c", cAmbig_in, cAmbig_in, cAmbig_in); else if (pList_in->pNext) { fprintf(pOutputFP_in, "%c%u%c", cAmbig_in, getStringListSize(pList_in), cAmbig_in ); szAmbig[0] = cAmbig_in; szAmbig[1] = '\0'; writeStringList( pList_in, szAmbig, pOutputFP_in ); fprintf(pOutputFP_in, "%c", cAmbig_in); } else fputs(pList_in->pszString, pOutputFP_in); } Source File ----------- `size_sl.c' identicalStringLists ==================== Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ int identicalStringLists(const StringList * pFirstList_in, const StringList * pSecondList_in); Description ----------- `identicalStringLists' checks whether or not two lists of strings are identical, that is, whether they have the same strings in the same order. The arguments to `identicalStringLists' are as follows: `pFirstList_in' points to a list of strings. `pSecondList_in' points to another list of strings. Return Value ------------ nonzero (TRUE) if the lists are identical, otherwise zero (FALSE) Example ------- #include "strlist.h" ... StringList * pList1; StringList * pList2; ... if (identicalStringLists(pList1, pList2)) { ... } Source File ----------- `equal_sl.c' isMemberOfStringList ==================== Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ int isMemberOfStringList(const StringList * pList_in, const char * pszString_in); Description ----------- `isMemberOfStringList' checks whether a string is stored in a list of strings. The arguments to `isMemberOfStringList' are as follows: `pList_in' points to a list of strings. `pszString_in' points to the string to be checked. Return Value ------------ nonzero (TRUE) if the string is found in the list, otherwise zero (FALSE) Example ------- #include "strlist.h" ... static StringList * pFiles_m = NULL; ... void processFileOnce(const char * pszFile_in) { if ((pszFile_in != NULL) && !isMemberOfStringList(pFiles_m, pszFile_in)) { pFiles_m = mergeIntoStringList(pFiles_m, pszFile_in); ... } } Source File ----------- `membr_sl.c' isolateWord =========== Syntax ------ #include "opaclib.h" char * isolateWord(char * pszLine_io); Description ----------- `isolateWord' isolates the "word" pointed to by its argument by replacing the first whitespace character following the word with a `NUL' character. It then steps the pointer to the beginning of the next "word" in the input string. `isolateWord' skips over any leading whitespace in the input string before trying to isolate a "word". `isolateWord' has one argument: `pszLine_io' points to a `NUL'-terminated character string. Return Value ------------ a pointer to the first character of the next following word, which may be the `NUL' character at the end of the input string Example ------- #include #include "opaclib.h" /* includes strlist.h */ ... StringList * pTraceMorphs_m = NULL; ... void addTraceMorphs(char * pszLine_in) { char * pszMorph; char * pszEnd; if (pszLine_in == NULL) return; for ( pszMorph = pszLine_in + strspn(pszLine_in, " \r\n\t\f\v"); *pszMorph_in ; pszMorph = pszEnd ) { pszEnd = isolateWord( pszMorph ); /* isolate the morpheme */ if (strcmp(pszMorph, "0") == 0) /* If 0, put in NUL */ *pszMorph = NUL; pTraceMorphs_m = mergeIntoStringList(pTraceMorphs_m, pszMorph); } } Source File ----------- `isolatew.c' isStringClassMember =================== Syntax ------ #include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ int isStringClassMember(const char * pszString_in, const StringClass * pClass_in); Description ----------- `isStringClassMember' searches a string class for a specific string. The arguments to `isStringClassMember' are as follows: `pszString_in' points to the string to look for. `pClass_in' points to a string class. Return Value ------------ nonzero (TRUE) if the string is found in the class, otherwise zero (FALSE) Example ------- #include "strclass.h" ... static StringClass * pClasses_m; ... int isClassMember(const char * pszString_in, const char * pszClassName_in) { StringClass * pClass; pClass = findStringClass(pszClassName_in, pClasses_m); if (pClass == NULL) return 0; return isStringClassMember(pszString_in, pClass); } Source File ----------- `strcla.c' loadIntxCtlFile =============== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ int loadIntxCtlFile(const char * pszFilename_in, int cComment_in, TextControl * pTextCtl_out, StringClass ** ppStringClasses_io); Description ----------- `loadIntxCtlFile' loads a text input control file into memory. This is a standard format file containing one data record with the following fields (not necessarily in this order): `\ambig' defines the character used to mark ambiguities in the output after processing. (This does not really belong in a "text input" control file, but exists for historical reasons and is kept for compatibility.) The `\ambig' field is optional, and may occur only once. `\barchar' defines the character used to start a short formatting command that consists of this character and the immediately following character. Its name comes from the use of the vertical bar character (`|') in the S.I.L. Manuscripter program in the early 1980's. The `\barchar' field is optional, and may occur only once. An empty field disables this feature. `\barcodes' defines the characters allowed to follow the `\barchar' character to form formatting commands. Whitespace (spaces, tabs, or newlines) in this field is optional. The `\barcodes' field is optional, and may occur any number of times. Its effect is cumulative. `\ch' defines an input othography change to apply to words after they have been decapitalized, but before any other processing takes place. A change consists of two or three parts, in this order: a match string, a replace string, and an optional environment. The match string and replace string must be quoted by some character that does not appear in either string. (ASCII single quotes and double quotes are most often used for this purpose.) The syntax of the environment is too complicated to discuss here: see Weber 1988 (pages 68-74, 82-83, and 86-90) for details. The `\ch' field is optional, and may occur any number of times. An ordered list of consistent changes is built by the function. Each change is applied to each input word as many times as necessary before the next change is applied. `\dsc' defines the character used to segment words in the output after processing. This is typically for dividing words into morphemes. (This does not really belong in a "text input" control file, but exists for historical reasons and is kept for compatibility.) The `\dsc' field is optional, and may occur only once. `\excl' specifies one or more "fields" to exclude from processing in the input file. Fields in the input file are marked by formatting commands such as those defined by the `\format' field in the text input control file. The `\excl' field lists one or more field codes (formatting commands) complete with the leading `\format' character. Field codes are separated by whitespace (spaces, tabs, or newlines). The `\excl' field is optional, and may occur any number of times. Its effect is cumulative. If any `\excl' fields occur, then no `\incl' fields are allowed, and all fields in the input file that are not explicitly listed in a `\excl' field will be processed. `\format' defines the character used to start a formatting command in the input text. The formatting command is assumed to consist of this characters and all following contiguous nonwhitespace characters. The `\format' field is optional, and may occur only once. `\incl' specifies one or more "fields" to include in processing in the input file. Fields in the input file are marked by formatting commands such as those defined by the `\format' field in the text input control file. The `\incl' field lists one or more field codes (formatting commands) complete with the leading `\format' character. Field codes are separated by whitespace (spaces, tabs, or newlines). The `\incl' field is optional, and may occur any number of times. Its effect is cumulative. If any `\incl' fields occur, then no `\excl' fields are allowed, and only those fields in the input file that are explicitly listed in a `\incl' field will be processed. `\luwfc' defines one or more "word formation characters" that have distinct lowercase and uppercase forms. The lowercase form is given first and must be followed by its uppercase form. The functions that use this information allow several lowercase characters to map onto a single uppercase character, and one lowercase character to map onto several uppercase characters. Whitespace (spaces, tabs, or newlines) in this field is optional. The `\luwfc' field is optional, and may occur any number of times. Its effect is cumulative. For lowercase and uppercase forms that are represented by two or more adjacent characters (bytes), use the `\luwfcs' field described below. `\luwfcs' defines one or more "word formation character multigraphs" that have distinct lowercase and uppercase forms. The lowercase form is given first and must be followed by its uppercase form. The functions that use this information allow several lowercase character multigraphs to map onto a single uppercase character multigraph, and one lowercase character multigraph to map onto several uppercase character multigraphs. Whitespace (spaces, tabs, or newlines) in this field is significant: each multigraph is separated from its neighbors by one or more whitespace characters. The `\luwfcs' field is optional, and may occur any number of times. Its effect is cumulative. Note that `\luwfcs' fields may be used to replace `\luwfc' fields, or the two types of fields may be mixed together in the control file. The implementation underlying the `\luwfcs' field does not require that the lowercase and uppercase forms occupy the same number of characters (bytes). `\maxdecap' defines the maximum number of alternative decapitalizations to produce when multiple lowercase characters map onto a single uppercase character. This probably matters only for handling words that are entirely capitalized, as the number of alternatives can grow very rapidly with the length of the word. The `\maxdecap' field is optional, and may occur only once. `\nocap' dictates that the orthography does not use capitalization at all. If this field is present, then the `\luwfc' and `\luwfcs' fields should not be used. The `\nocap' field is optional, and may occur only once. `\noincap' dictates that capitalization applies to only the first character of a word, or to all characters of a word, but not to individual characters. That is, it tells to program not to attempt to deal with names like `McConnel'. The `\noincap' field is optional, and may occur only once. `\scl' defines a string class, presumably for use by one or more orthography input changes. The first item in the field is the name of the class. All other items are members of the class. Items are separated by whitespace (spaces, tabs, or newlines). The `\scl' field is optional, and any number of string classes may be defined. A string class definition must occur before any `\ch' field that uses that string class. `\wfc' defines one or more "word formation characters" that do not have distinct lowercase and uppercase forms. Whitespace (spaces, tabs, or newlines) in this field is optional. The `\wfc' field is optional, and may occur any number of times. Its effect is cumulative. For caseless forms that are represented by two or more adjacent characters (bytes), use the `\wfcs' field described below. `\wfcs' defines one or more multibyte "word formation characters" that do not have distinct lowercase and uppercase forms. Whitespace (spaces, tabs, or newlines) in this field is required to separate the different multibyte characters. The `\wfcs' field is optional, and may occur any number of times. Its effect is cumulative. Note that `\wfcs' fields may be used to replace `\wfc' fields, or the two types of fields may be mixed together in the control file. For more details about this file, see the AMPLE Reference Manual, section 8 `Text Input Control File'. The arguments to `loadIntxCtlFile' are as follows: `pszFilename_in' points to the name of the text input control file. `cComment_in' is the character used to initiate comments on lines in the file. `pTextCtl_out' points to a data structure for storing information read from the file. `ppStringClasses_io' is the address of a pointer to a set of string classes possibly used by `\ch' fields or added to by `\scl' fields. Return Value ------------ zero if successful, nonzero if an error occurs Example ------- #include #include "textctl.h" /* includes strclass.h */ #include "rpterror.h" ... char szIntxFilename_g[200]; TextControl sTextControl_g; StringClass * pStringClasses_g = NULL; static TextControl sDefaultTextControl_m = { NULL, /* filename */ NULL, /* ordered array of lowercase letters */ NULL, /* ordered array of matching uppercase letters */ NULL, /* array of caseless letters */ NULL, /* list of input orthography changes */ NULL, /* list of output (orthography) changes */ NULL, /* list of format markers (fields) to include */ NULL, /* list of format markers (fields) to exclude */ '\\', /* initial character of format markers (field codes) */ '%', /* character for marking ambiguities and failures */ '-', /* character for marking decomposition */ '|', /* initial character of secondary format markers */ NULL, /* (Manuscripter) bar codes */ TRUE, /* flag whether to capitalize individual letters */ TRUE, /* flag whether to decapitalize/recapitalize */ 100 /* maximum number of decapitalization alternatives */ }; ... memcpy(&sTextControl_g, &sDefaultTextControl_m, sizeof(TextControl)); fprintf(stderr, "Text Control File (xxINTX.CTL) [none]: "); fgets( szIntxFilename_g, 200, stdin ); if (szIntxFilename_g[0]) { if (loadIntxCtlFile(szIntxFilename_g, ';', sTextControl_g, pStringClasses_g) != 0) { reportError(ERROR_MSG, "Error reading text control file %s\n", szIntxFilename_g); } } if ( (sTextControl_g.cBarMark == NUL) && (sTextControl_g.pszBarCodes != NULL) ) { freeMemory(sTextControl_g.pszBarCodes); sTextControl_g.pszBarCodes = NULL; } if ( (sTextControl_g.cBarMark != NUL) && (sTextControl_g.pszBarCodes == NULL) ) { sTextControl_g.pszBarCodes = (unsigned char *)duplicateString( "bdefhijmrsuvyz"); } Source File ----------- `loadintx.c' loadOutxCtlFile =============== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ int loadOutxCtlFile(const char * pszFilename_in, int cComment_in, TextControl * pTextCtl_out, StringClass ** ppStringClasses_io); Description ----------- `loadOutxCtlFile' loads a text output control file into memory. This is a standard format file containing one data record with the following fields (not necessarily in this order): `\ambig' defines the character used to mark ambiguities in the output after processing. The `\ambig' field is optional, and may occur only once. `\ch' defines an output othography change to apply to words after they have processed, but before they are recapitalized. A change consists of two or three parts, in this order: a match string, a replace string, and an optional environment. The match string and replace string must be quoted by some character that does not appear in either string. (ASCII single quotes and double quotes are most often used for this purpose.) The syntax of the environment is too complicated to discuss here: see Weber 1988 (pages 68-74, 82-83, and 86-90) for details. The `\ch' field is optional, and may occur any number of times. An ordered list of consistent changes is built by the function. Each change is applied to each output word as many times as necessary before the next change is applied. `\dsc' defines the character used to segment words in the output after processing. This is typically for dividing words into morphemes. (This does not really belong in a "text output" control file, but exists for historical reasons and is kept for compatibility.) The `\dsc' field is optional, and may occur only once. `\format' defines the character used to start a formatting command in the input text. The formatting command is assumed to consist of this characters and all following contiguous nonwhitespace characters. The `\format' field is optional, and may occur only once. `\luwfc' defines one or more "word formation characters" that have distinct lowercase and uppercase forms. The lowercase form is given first and must be followed by its uppercase form. The functions that use this information allow several lowercase characters to map onto a single uppercase character, and one lowercase character to map onto several uppercase characters. Whitespace (spaces, tabs, or newlines) in this field is optional. The `\luwfc' field is optional, and may occur any number of times. Its effect is cumulative. For lowercase and uppercase forms that are represented by two or more adjacent characters (bytes), use the `\luwfcs' field described below. `\luwfcs' defines one or more "word formation character multigraphs" that have distinct lowercase and uppercase forms. The lowercase form is given first and must be followed by its uppercase form. The functions that use this information allow several lowercase character multigraphs to map onto a single uppercase character multigraph, and one lowercase character multigraph to map onto several uppercase character multigraphs. Whitespace (spaces, tabs, or newlines) in this field is significant: each multigraph is separated from its neighbors by one or more whitespace characters. The `\luwfcs' field is optional, and may occur any number of times. Its effect is cumulative. Note that `\luwfcs' fields may be used to replace `\luwfc' fields, or the two types of fields may be mixed together in the control file. The implementation underlying the `\luwfcs' field does not require that the lowercase and uppercase forms occupy the same number of characters (bytes). `\scl' defines a string class, presumably for use by one or more orthography output changes. The first item in the field is the name of the class. All other items are members of the class. Items are separated by whitespace (spaces, tabs, or newlines). The `\scl' field is optional, and any number of string classes may be defined. A string class definition must occur before any `\ch' field that uses that string class. `\wfc' defines one or more "word formation characters" that do not have distinct lowercase and uppercase forms. Whitespace (spaces, tabs, or newlines) in this field is optional. The `\wfc' field is optional, and may occur any number of times. Its effect is cumulative. For caseless forms that are represented by two or more adjacent characters (bytes), use the `\wfcs' field described below. `\wfcs' defines one or more multibyte "word formation characters" that do not have distinct lowercase and uppercase forms. Whitespace (spaces, tabs, or newlines) in this field is required to separate the different multibyte characters. The `\wfcs' field is optional, and may occur any number of times. Its effect is cumulative. Note that `\wfcs' fields may be used to replace `\wfc' fields, or the two types of fields may be mixed together in the control file. Note that these are only a subset of the fields allowed in a text input control file. For more details about this file, see the KTEXT Reference Manual, section 8 `Text Output Control File'. The arguments to `loadOutxCtlFile' are as follows: `pszFilename_in' points to the name of the text output control file. `cComment_in' is the character used to initiate comments on lines in the file. `pTextCtl_out' points to a data structure for storing information read from the file. `ppStringClasses_io' is the address of a pointer to a set of string classes possibly used by `\ch' fields or added to by `\scl' fields. Return Value ------------ zero if successful, nonzero if an error occurs Example ------- #include #include "textctl.h" /* includes strclass.h */ #include "rpterror.h" ... char szOutxFilename_g[200]; TextControl sOutputControl_g; StringClass * pStringClasses_g = NULL; ... memset(&sOutputControl_g, 0, sizeof(TextControl)); fprintf(stderr, "Text Output Control File (xxOUTX.CTL) [none]: "); fgets(szOutxFilename_g, 200, stdin); if (szOutxFilename_g[0]) { if (loadOutxCtlFile(szOutxFilename_g, ';', sOutputControl_g, pStringClasses_g) != 0) { reportError(ERROR_MSG, "Error reading text output control file %s\n", szOutxFilename_g); } } Source File ----------- `loadoutx.c' matchAlphaChar ============== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ int matchAlphaChar(const unsigned char * pszString_in, const TextControl * pTextCtl_in); Description ----------- `matchAlphaChar' checks whether the input string begins with a multibyte alphabetic (word formation) character. If so, it returns the number of bytes in the matched multibyte alphabetic character. This function depends on previous calls to `addWordFormationChars', `addWordFormationCharStrings', `addLowerUpperWFChars', and `addLowerUpperWFCharStrings' to establish the multibyte alphabetic characters. (These functions are implicitly called by `loadIntxCtlFile' and `loadOutxCtlFile'.) The arguments to `matchAlphaChar' are as follows: `pszString_in' points to a string to match against. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ the number of bytes occupied by the multibyte alphabetic character at the beginning of the input string, or zero if the the string does not begin with a multibyte alphabetic character Example ------- See the example for `convLowerToUpper' above. Source File ----------- `myctype.c' matchBeginning ============== Syntax ------ #include "opaclib.h" int matchBeginning(const char * pszString_in, const char * pszBegin_in); Description ----------- `matchBeginning' compares two strings, using the end of the second string as the cutoff point for the comparison. It is functionally equivalent to (strncmp(pszString_in, pszBegin_in, strlen(pszBegin_in)) == 0) The arguments to `matchBeginning' are as follows: `pszString_in' points to a string to examine. `pszBegin_in' points to a string to compare to the beginning of the other string. Return Value ------------ nonzero (TRUE) if the two strings are equal up to the end of the second string, otherwise zero (FALSE) Example ------- #include "opaclib.h" ... char string[100], match[50]; ... if (matchBeginning(string, match)) { ... } Source File ----------- `matchbeg.c' matchBeginWithStringClass ========================= Syntax ------ #include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ size_t matchBeginWithStringClass(const char * pszString_in, const StringClass * pClass_in); Description ----------- `matchBeginWithStringClass' searches a string class to find a class member that matches the beginning of a string. It stops at the first successful match. The arguments to `matchBeginWithStringClass' are as follows: `pszString_in' points to a string to match against. `pClass_in' points to a string class to search for a match. Return Value ------------ the length of the first successful match if found (effectively TRUE), otherwise zero (FALSE) Example ------- #include "strclass.h" ... static StringClass * pClasses_m; ... int matchesClassMemberAtBeginning(const char * pszString_in, const char * pszClassName_in) { StringClass * pClass; pClass = findStringClass(pszClassName_in, pClasses_m); if (pClass == NULL) return 0; return matchBeginWithStringClass(pszString_in, pClass); } Source File ----------- `strcla.c' matchCaselessChar ================= Syntax ------ #include "textctl.h" int matchCaselessChar(const unsigned char * pszString_in, const TextControl * pTextCtl_in); Description ----------- `matchCaselessChar' checks whether the input string begins with a multibyte caseless character. If so, it returns the number of bytes in the matched multibyte caseless character. This function depends on previous calls to `addWordFormationChars' or `addWordFormationCharStrings' to establish the multibyte caseless characters. (`addWordFormationChars' and `addWordFormationCharStrings' are implicitly called by `loadIntxCtlFile' and `loadOutxCtlFile'.) The arguments to `matchCaselessChar' are as follows: `pszString_in' points to a string to match against. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ the number of bytes occupied by the multibyte caseless character at the beginning of the input string, or zero if the the string does not begin with a multibyte caseless character Example ------- See the example for `matchLowercaseChar' below. Source File ----------- `myctype.c' matchEnd ======== Syntax ------ #include "opaclib.h" int matchEnd(const char * pszString_in, const char * pszTail_in); Description ----------- `matchEnd' compares the second string against the end of the first string. It is functionally equivalent to ((strlen(pszString_in) < strlen(pszTail_in)) ? 0 : (strcmp(pszString_in + strlen(pszString_in) - strlen(pszTail_in), pszTail_in) == 0)) The arguments to `matchEnd' are as follows: `pszString_in' points to a string to examine. `pszTail_in' points to a string to compare to the end of the other string. Return Value ------------ nonzero (TRUE) if the second string matches the end of the first string, otherwise zero (FALSE) Example ------- #include "opaclib.h" ... char string[100], match[50]; ... if (matchEnd(string, match)) { ... } Source File ----------- `matchend.c' matchEndWithStringClass ======================= Syntax ------ #include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ size_t matchEndWithStringClass(const char * pszString_in, const StringClass * pClass_in); Description ----------- `matchEndWithStringClass' searches a string class to find a class member that matches the end of a string. It stops at the first successful match. The arguments to `matchEndWithStringClass' are as follows: `pszString_in' points to a string to match against. `pClass_in' points to a string class to search for a match. Return Value ------------ the length of the first successful match if found (effectively TRUE), otherwise zero (FALSE) Example ------- #include "strclass.h" ... static StringClass * pClasses_m; ... int matchesClassMemberAtEnd(const char * pszString_in, const char * pszClassName_in) { StringClass * pClass; pClass = findStringClass(pszClassName_in, pClasses_m); if (pClass == NULL) return 0; return matchEndWithStringClass(pszString_in, pClass); } Source File ----------- `strcla.c' matchLowercaseChar ================== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ int matchLowercaseChar(const unsigned char * pszString_in, const TextControl * pTextCtl_in); Description ----------- `matchLowercaseChar' checks whether the input string begins with a multibyte lowercase character. If so, it returns the number of bytes in the matched multibyte lowercase character. This function depends on previous calls to `addLowerUpperWFChars' or `addLowerUpperWFCharStrings' to establish the multibyte lowercase characters. (`addLowerUpperWFChars' and `addLowerUpperWFCharStrings' are implicitly called by `loadIntxCtlFile' and `loadOutxCtlFile'.) The arguments to `matchLowercaseChar' are as follows: `pszString_in' points to a string to match against. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ the number of bytes occupied by the multibyte lowercase character at the beginning of the input string, or zero if the the string does not begin with a multibyte lowercase character Example ------- #include "textctl.h" #define CASELESS -1 #define NOCAP 0 #define INITCAP 1 #define ALLCAP 2 #define MIXCAP 3 int getWordCase(const unsigned char * pszWord_in, const TextControl * pTextCtl_in) { unsigned uiUpperCount = 0; unsigned uiLowerCount = 0; int bFirstCap = 0; int iLength; unsigned char * p; for ( p = pszWord_in ; p && *p ; p += iLength ) { iLength = matchLowercaseChar(p, pTextCtl_in); if (iLength != 0) ++uiLowerCount; else { iLength = matchUppercaseChar(p, pTextCtl_in); if (iLength != 0) { ++uiUpperCount; if (uiLowerCount == 0) bFirstCap = 1; } else { iLength = matchCaselessChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; } } } if ((uiUpperCount == 0) && (uiLowerCount == 0)) return CASELESS; else if (uiUpperCount == 0) return NOCAP; else if (bFirstCap && (uiUpperCount == 1)) return INITCAP; else if (uiLowerCount == 0) return ALLCAP; else return MIXCAP; } Source File ----------- `myctype.c' matchUppercaseChar ================== Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ int matchUppercaseChar(const unsigned char * pszString_in, const TextControl * pTextCtl_in); Description ----------- `matchUppercaseChar' checks whether the input string begins with a multibyte uppercase character. If so, it returns the number of bytes in the matched multibyte uppercase character. This function depends on previous calls to `addLowerUpperWFChars' or `addLowerUpperWFCharStrings' to establish the multibyte uppercase characters. (`addLowerUpperWFChars' and `addLowerUpperWFCharStrings' are implicitly called by `loadIntxCtlFile' and `loadOutxCtlFile'.) The arguments to `matchUppercaseChar' are as follows: `pszString_in' points to a string to match against. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ the number of bytes occupied by the multibyte lowercase character at the beginning of the input string, or zero if the the string does not begin with a multibyte lowercase character Example ------- See the example for `matchLowercaseChar' above. Source File ----------- `myctype.c' mergeIntoStringList =================== Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * mergeIntoStringList(StringList * pList_io, const char * pszString_in); Description ----------- `mergeIntoStringList' adds a string to the beginning of a list of strings if it is not already present in the list. The arguments to `mergeIntoStringList' are as follows: `pList_io' points to a list of strings. `pszString_in' points to the string to be added. A copy created with `duplicateString' is stored in the list, not the original string itself. Return Value ------------ a pointer to the possibly modified list of strings Example ------- #include "strlist.h" ... StringList * pStrings = NULL; ... pStrings = mergeIntoStringList(pStrings, "this"); /* pStrings-->"this"-->NULL */ pStrings = mergeIntoStringList(pStrings, "test"); /* pStrings-->"test"-->"this"-->NULL */ pStrings = mergeIntoStringList(pStrings, "is"); /* pStrings-->"is"-->"test"-->"this"-->NULL */ pStrings = mergeIntoStringList(pStrings, "a"); /* pStrings-->"a"-->"is"-->"test"-->"this"-->NULL */ pStrings = mergeIntoStringList(pStrings, "test"); /* pStrings-->"a"-->"is"-->"test"-->"this"-->NULL */ Source File ----------- `add_sl.c' mergeIntoStringListAtEnd ======================== Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * mergeIntoStringListAtEnd(StringList * pList_io, const char * pszString_in); Description ----------- `mergeIntoStringListAtEnd' adds a string to the end of a list of strings if it is not already present in the list. The arguments to `mergeIntoStringListAtEnd' are as follows: `pList_io' points to a list of strings. `pszString_in' points to the string to be added. A copy created with `duplicateString' is stored in the list, not the original string itself. Return Value ------------ a pointer to the possibly modified list of strings Example ------- #include "strlist.h" ... StringList * pStrings = NULL; ... pStrings = mergeIntoStringListAtEnd(pStrings, "this"); /* pStrings-->"this"-->NULL */ pStrings = mergeIntoStringListAtEnd(pStrings, "test"); /* pStrings-->"this"-->"test"-->NULL */ pStrings = mergeIntoStringListAtEnd(pStrings, "is"); /* pStrings-->"this"-->"test"-->"is"-->NULL */ pStrings = mergeIntoStringListAtEnd(pStrings, "a"); /* pStrings-->"this"-->"test"-->"is"-->"a"-->NULL */ pStrings = mergeIntoStringListAtEnd(pStrings, "test"); /* pStrings-->"this"-->"test"-->"is"-->"a"-->NULL */ Source File ----------- `appnd_sl.c' mergeTwoStringLists =================== Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * mergeTwoStringLists(StringList * pFirstList_io, StringList * pSecondList_io); Description ----------- `mergeTwoStringLists' merges two lists of strings together to form a single list. Any strings in the second list that exist in the first list are freed. Neither of the original lists survives this operation. The arguments to `mergeTwoStringLists' are as follows: `pFirstList_io' points to a list of strings. `pSecondList_io' points to another list of strings. Return Value ------------ a pointer to the merged list Example ------- #include "strlist.h" ... StringList * pStrings = NULL; StringList * pStrings1 = NULL; StringList * pStrings2 = NULL; ... pStrings1 = mergeIntoStringListAtEnd(pStrings1, "this"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "test"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "is"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "a"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "test"); pStrings2 = mergeIntoStringList(pStrings2, "that"); pStrings2 = mergeIntoStringList(pStrings2, "test"); pStrings2 = mergeIntoStringList(pStrings2, "is"); pStrings2 = mergeIntoStringList(pStrings2, "good"); /* pStrings1-->"this"-->"test"-->"is"-->"a"-->NULL */ /* pStrings2-->"good"-->"is"-->"test"-->"that"-->NULL */ pStrings = mergeTwoStringLists(pStrings1, pStrings2); /* pStrings-->"good"-->"that"-->"this"-->"test"-->"is"-->"a"-->NULL */ /* pStrings1-->-----------------^ */ /* pStrings2-->??? */ Source File ----------- `cat_sl.c' parseChangeString ================= Syntax ------ #include "change.h" /* or textctl.h or template.h or opaclib.h */ Change * parseChangeString(const char * pszString_in, const StringClass * pClassList_in); Description ----------- `parseChangeString' parses a string to build a Change structure. The arguments to `parseChangeString' are as follows: `pszString_in' points to a change definition string. `pClasses_in' points to a collection of string classes that may be referenced in the environment portion of the change definition. Return Value ------------ a pointer to a newly allocated Change structure, or `NULL' if an error occurred while parsing the change definition Example ------- #include "change.h" /* includes strclass.h */ ... Change * addChange(const char * pszChange_in, Change * pChanges_io, const StringClass * pClasses_in) { Change * pChange; Change * pTail; pChange = parseChangeString(pszChange_in, pClasses_in); if (pChange != NULL) { if (pChanges_io == NULL) return pChange; /* * keep the list of changes in the original order */ for (pTail = pChanges_io ; pTail->pNext ; pTail = pTail->pNext) ; pTail->pNext = pChange; } return pChanges_io; } Source File ----------- `change.c' promptUser ========== Syntax ------ #include "opaclib.h" void promptUser(const char * pszPrompt_in, char * pszBuffer_out, unsigned uiBufferSize_in); Description ----------- `promptUser' prompts the user, then reads a line of input from the keyboard (normally the standard input). If an `EOF' occurs, `promptUser' tries to reopen the keyboard. The arguments to `promptUser' are as follows: `pszPrompt_in' points to a prompt message string. `pszBuffer_out' points to an input buffer. `uiBufferSize_in' is the size of the input buffer (not counting space for the terminating `NUL'). Return Value ------------ none Example ------- #include #include "opaclib.h" ... char szFilename_g[BUFSIZ+1]; FILE * pInputFP_g; char szBuffer_g[17]; long iRepeatCount_g; ... promptUser("Data file: ", szFilename_g, BUFSIZ); pInputFP_g = fopen(szFilename_g, "r"); ... promptUser("Number of iterations to perform: ", szBuffer_g, 16); iRepeatCount_g = strtol(szBuffer_g, NULL, 10); Source File ----------- `promptus.c' readLineFromFile ================ Syntax ------ #include "opaclib.h" char * readLineFromFile(FILE * pInputFP_in, unsigned * puiLineNumber_io, int cComment_in); Description ----------- `readLineFromFile' reads an arbitrarily long line of input text, erasing the trailing newline character. The string returned is overwritten or freed at the next call to `readLineFromFile'. The arguments to `readLineFromFile' are as follows: `pInputFP_in' is a input `FILE' pointer. `puiLineNumber_io' points to a line number counter, or is `NULL'. `cComment_in' is the character that marks the beginning of a comment. Return Value ------------ the address of the buffer containing the `NUL'-terminated line, or `NULL' if already at the end of the file Example ------- #include #include #include "opaclib.h" void processFile(const char * pszFilename_in) { FILE * pInputFP; unsigned uiLineNumber; char * pszLine; if (pszFilename_in == NULL) return; pInputFP = fopen(pszFilename_in, "r"); if (pInputFP == NULL) return; uiLineNumber = 1; while ((pszLine = readLineFromFile(pInputFP, &uiLineNumber, ';')) != NULL) { ... } printf("%u lines read from %s\n", uiLineNumber, pszFilename_in); } Source File ----------- `readline.c' readSentenceOfTemplates ======================= Syntax ------ #include "template.h" /* or opaclib.h */ WordTemplate ** readSentenceOfTemplates(FILE * pInputFP_in, const char * pszAnaFile_in, const char * pszFinalPunct_in, TextControl * pTextCtl_in, FILE * pLogFP_in) Description ----------- `readSentenceOfTemplates' reads an arbitrarily long sentence (sequence of words) from an input analysis file, building an array of `WordTemplate' data structures. The sentence is terminated by a sentence-final punctuation character from `pszFinalPunct_in'. The arguments to `readSentenceOfTemplates' are as follows: `pInputFP_in' is an input `FILE' pointer. `pszAnaFile_in' points to the name of the input analysis file. `pszFinalPunct_in' points to a `NUL'-terminated string of punctuation characters that mark the end of a sentence. `pTextCtl_in' points to a data structure that contains the decomposition and ambiguity marker characters. `pLogFP_in' is an output `FILE' pointer, used to log error messages, or `NULL'. Return Value ------------ a pointer to a dynamically allocated `NULL'-terminated array of pointers to dynamically allocated `WordTemplate' structures Example ------- #include #include "template.h" #include "allocmem.h" #include "rpterror.h" ... TextControl sTextControl_g; static const char szSentenceFinalPunc_m[] = ".!?"; static const char szCannotOpen_m[] = "Warning: cannot open analysis input file %s\n"; ... void processSentences(char * pszAnaFile_in, FILE * pLogFP_in) { FILE * pInputFP; WordTemplate ** pSentence; unsigned uiSentenceCount; unsigned i; ... pInputFP = fopen(pszAnaFile_in, "r"); if (pInputFP == NULL) { reportError(ERROR_MSG, szCannotOpen_m, pszAnaFile_in); if (pLogFP_in != NULL) fprintf(pLogFP_in, szCannotOpen_m, pszAnaFile_in); return 0; } for ( uiSentenceCount = 0 ;; ++uiSentenceCount ) { pSentence = readSentenceOfTemplates(pInputFP, pszAnaFile_in, szSentenceFinalPunc_m, &sTextControl_g, pLogFP_in); if (pSentence == NULL) break; ... for ( i = 0 ; pSentence[i] ; ++i ) freeWordTemplate( pSentence[i] ); freeMemory( pSentence ); } return uiSentenceCount; } Source File ----------- `senttemp.c' readStdFormatField ================== Syntax ------ #include "opaclib.h" char ** readStdFormatField(FILE * pInputFP_in, const char ** ppszFieldCodes_in, int cComment_in); Description ----------- `readStdFormatField' reads an arbitrarily large text field that starts with a backslash marker at the beginning of a line. Each line of the input field is stored separately in a `NULL'-terminated array of strings. If the field code at the beginning matches one of those in the input array of field codes, it is replaced by a single byte containing the 1-based index of the matching field code. Otherwise, the field code is left intact except that the backslash character is replaced by the character code `255' (`'\377''). This function is an alternative to `readStdFormatRecord', which potentially reads several fields at a time. The arguments to `readStdFormatField' are as follows: `pInputFP_in' is an input `FILE' pointer. `ppszFieldCodes_in' points to a `NULL'-terminated array of field code strings. `cComment_in' is the character used to initiate comments in a line. Return Value ------------ a pointer to a dynamically allocated `NULL'-terminated array of pointers to dynamically allocated lines of text Example ------- #include #include "opaclib.h" ... static char szWhitespace_m[7] = " \t\r\n\f\v"; ... int read_control_file(char * pszControlFile_in) { int i; char * pszRuleFile = NULL; char * pszLexiconFile = NULL; char * pszGrammarFile = NULL; StringList * pTraceList = NULL; char * pszMorph; FILE * pControlFP; char ** ppszField; char * pszLine; static char * aszCodes_s[] = { "\\rules", "\\lexicon", "\\grammar", "\\trace", ..., NULL }; if (pszControlFile_in == NULL) return FALSE; pControlFP = fopen(pszControlFile_in, "r"); if (pControlFP == (FILE *)NULL) { reportError(WARNING_MSG, "Cannot open control file %s\n", pszControlFile_in); return FALSE; } for (;;) { ppszField = readStdFormatField(pControlFP, aszCodes_s, NUL)); if (ppszField == NULL) break; switch (**ppszField) { case 1: /* "\\rules" */ if (pszRuleFile != NULL) reportError(WARNING_MSG, "Rule file already specified: %s\n", pszRuleFile); else { for ( i = 0 ; ppszField[i] ; ++i ) { pszLine = ppszField[i]; if (i == 0) ++pszLine; pszRuleFile = strtok(pszLine, szWhitespace_m); if (pszRuleFile != NULL) break; } } break; case 2: /* "\\lexicon" */ if (pszLexiconFile != NULL) reportError(WARNING_MSG, "Lexicon file already specified: %s\n", pszLexiconFile); else { for ( i = 0 ; ppszField[i] ; ++i ) { pszLine = ppszField[i]; if (i == 0) ++pszLine; pszLexiconFile = strtok(pszLine, szWhitespace_m); if (pszLexiconFile != NULL) break; } } break; case 3: /* "\\grammar" */ if (pszGrammarFile != NULL) reportError(WARNING_MSG, "Grammar file already specified: %s\n", pszGrammarFile); else { for ( i = 0 ; ppszField[i] ; ++i ) { pszLine = ppszField[i]; if (i == 0) ++pszLine; pszGrammarFile = strtok(pszLine, szWhitespace_m); if (pszGrammarFile != NULL) break; } } break; case 4: /* "\\trace" */ for ( i = 0 ; ppszField[i] ; ++i ) { pszLine = ppszField[i]; if (i == 0) ++pszLine; for ( pszMorph = strtok(pszLine, szWhitespace_m) ; pszMorph ; pszMorph = strtok(NULL, szWhitespace_m) { pTraceList = mergeIntoStringList(pTraceList, pszMorph); } } break; ... default: reportError(WARNING_MSG, "Unknown field: \\%s\n", ppszField[0] + 1); break; } for ( i = 0 ; ppszField[i] ; ++i ) freeMemory(ppszField[i]); freeMemory(ppszField); } fclose(pControlFP); ... return TRUE; } Source File ----------- `readfiel.c' readStdFormatRecord =================== Syntax ------ #include "record.h" /* or opaclib.h */ char * readStdFormatRecord(FILE * pInputFP_in, const CodeTable * pCodeTable_in, int cComment_in, unsigned * puiRecordCount_io); Description ----------- `readStdFormatRecord' reads the next record from a standard format file. The record is stored in memory as a series of `NUL'-terminated strings stored consecutively in a single buffer, with the record terminated by two consecutive `NUL' bytes. The first character of each string is either a character representing the field code (if found in the code table), or a backslash indicating that the field code was not recognized. This function is an alternative to `readStdFormatField', which always reads only one field at a time. The arguments to `readStdFormatRecord' are as follows: `pInputFP_in' is an input `FILE' pointer. `pCodeTable_in' points to the field code table used to decode the standard format file field code markers. `cComment_in' is a character that marks comments in the input file. `puiRecordCount_io' points to a counter for keeping track of the number of records read, or is `NULL'. Return Value ------------ a pointer to the buffer containing the record, or `NULL' for `EOF'. Example ------- #include #include #include "record.h" ... void loadStdFmtFile(pszFilename_in) char * pszFilename_in; { FILE * pInputFP; char * pRecord; char * pszField; char * pszNextField; unsigned uiRecordCount; static CodeTable sCodeTable_s = { "\ \\a\0A\0\ \\d\0D\0\ \\w\0W\0\ \\f\0F\0\ \\c\0C\0\ \\n\0N\0" 6, "\\a" }; if (pszFilename_in == NULL) return; pInputFP = fopen(pszFilename_in, "r"); if (pInputFP == NULL) return; while ((pRecord = readStdFormatRecord(pInputFP, &sCodeTable_s, ';', &uiRecordCount)) != NULL) { pszField = pRecord; while ((c = *pszField++) != '\0') { pszNextField = pszField + strlen(pszField) + 1; switch (c) { case 'A': ... break; case 'C': ... break; case 'D': ... break; case 'F': ... break; case 'N': ... break; case 'W': ... break; default: ... break; } pszField = pszNextField; } ... } cleanupAfterStdFormatRecord(); fclose(pInputFP); return; } Source File ----------- `record.c' readTemplateFromAnalysis ======================== Syntax ------ #include "template.h" /* or opaclib.h */ WordTemplate * readTemplateFromAnalysis( FILE * pInputFP_in, const TextControl * pTextCtl_in); Description ----------- `readTemplateFromAnalysis' fills in a `WordTemplate' data structure from an AMPLE style analysis file. The arguments to `readTemplateFromAnalysis' are as follows: `pInputFP_in' is an input `FILE' pointer. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ a pointer to a dynamically allocated `WordTemplate' data structure, or `NULL' if either `EOF' or an error occurs Example ------- #include "template.h" #include "rpterror.h" ... void synthesizeFile( char * pszInputFile_in, char * pszOutputFile_in, TextControl * pTextCtl_in) { FILE * pInputFP; FILE * pOutputFP; WordTemplate * pWord; WordAnalysis * pAnal; ... /* * open the files */ if ((pszInputFile_in == NULL) || (pszOutputFile_in == NULL)) return; pInputFP = fopen(pszInputFile_in, "r"); if (pInputFP == NULL) { reportError(WARNING_MSG, "Cannot open input file %s\n", pszInputFile_in); return; } pOutputFP = fopen(pszOutputFile_g, "w"); if (pOutputFP == NULL) { reportError(WARNING_MSG, "Cannot open output file %s\n", pszOutputFile_in); fclose(pInputFP); return; } /* * process the data */ for (;;) { pWord = readTemplateFromAnalysis(pInputFP, &pTextCtl_in); if (pWord == NULL) break; ... for ( pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext ) { ... } ... writeTextFromTemplate( pOutputFP, pWord, pTextCtl_in); freeWordTemplate( pWord ); } ... fclose(pInputFP); fclose(pOutputFP); } Source File ----------- `dtbin.c' readTemplateFromText ==================== Syntax ------ #include "template.h" /* or opaclib.h */ WordTemplate * readTemplateFromText(FILE * pInputFP_in, const TextControl * pTextCtl_in); Description ----------- `readTemplateFromText' reads a word from a text file into a `WordTemplate' structure. The arguments to `readTemplateFromText' are as follows: `pInputFP_in' is an input `FILE' pointer. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ a pointer to a dynamically allocated `WordTemplate' data structure, or `NULL' if either `EOF' or an error occurs Example ------- See the example for `freeWordTemplate' above. Source File ----------- `textin.c' readTemplateFromTextString ========================== Syntax ------ #include "template.h" /* or opaclib.h */ WordTemplate * readTemplateFromTextString(unsigned char ** ppszString_io, const TextControl * pTextCtl_in); Description ----------- `readTemplateFromText' reads a word from a text string into a `WordTemplate' structure. The arguments to `readTemplateFromText' are as follows: `ppszString_io' points to a pointer which points to the string to be "read". The pointer to the string will be updated by this routine. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ a pointer to a dynamically allocated `WordTemplate' data structure, or `NULL' if either the string consists merely of `NUL' or an error occurs Example ------- #include "template.h" ... TextControl sTextCtl_g; ... WordAnalysis * merge_analyses( WordAnalysis * pList_in, WordAnalysis * pAnal_in) { ... } ... void process( unsigned char *pszInputText_in, FILE * pOutputFP_in) { char * pszInputText; char * pszWord; WordTemplate * pWord; WordAnalysis * pAnal; unsigned uiAmbiguityCount; unsigned long uiWordCount; pszInputText = duplicateString(pszInputText_in); pszWord = pszInputText; for ( uiWordCount = 0L ;; ) { pWord = readTemplateFromTextString(&pszWord, &sTextCtl_g); if (pWord == NULL) break; uiAmbiguityCount = 0; if (pWord->paWord != NULL) { for ( i = 0 ; pWord->paWord[i] ; ++i ) { pAnal = analyze(pWord->paWord[i]); pWord->pAnalyses = merge_analyses(pWord->pAnalyses, pAnal); } for (pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext) ++uiAmbiguityCount; } writeTemplate(pOutputFP_in, NULL, pWord, &sTextCtl_g); freeWordTemplate(pWord); } freeMemory(pszInputText); } Source File ----------- `textin.c' reallocMemory ============= Syntax ------ #include "allocmem.h" /* or opaclib.h */ void * reallocMemory(void * pBuffer_in, size_t uiSize_in); Description ----------- `reallocMemory' adjusts an allocated buffer to a new size. It provides a "safe" interface to either `realloc' or `malloc', depending on whether or not `pBuffer_in' is `NULL'. Running out of memory is handled the same as for `allocMemory'; see `allocMemory' above. The arguments to `reallocMemory' are as follows: `pBuffer_in' points to a dynamically allocated buffer previously returned by `allocMemory', `reallocMemory', or `duplicateString'. It also may be `NULL' to allocate a new block of memory. `uiSize_in' is the new size, either smaller or larger than the previous allocation size. Return Value ------------ a pointer to a possibly reallocated block Example ------- See the example for `fitAllocStringExactly' above. Source File ----------- `allocmem.c' recapitalizeWord ================ Syntax ------ #include "template.h" /* or opaclib.h */ void recapitalizeWord(char * pszWord_io, int iRecap_in, const TextControl * pTextCtl_in); Description ----------- `recapitalizeWord' tries to reimpose capitalization as it was in the original input text. The arguments to `recapitalizeWord' are as follows: `pszWord_io' points to the word to recapitalize. `iRecap_in' is the capitalization flag: `0 (NOCAP)' None of the characters are capitalized. `1 (INITCAP)' Only the initial character is capitalized. `2 (ALLCAP)' All of the characters are capitalized. `4-65535' These values are bitmaps of individually capitalized characters, with `4' encoding the capitalization of the first character, `8' encoding the second character, and so on. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ none Example ------- #include "template.h" void fix_new_words(pTemplate_io, pTextCtl_in) WordTemplate * pTemplate_io; const TextControl * pTextCtl_in; { StringList * pWord; char * p; if ((pTemplate_io == NULL) || (pTemplate_io->pNewWords == NULL)) return; if (pTextCtl_in == NULL) return; /* * apply orthography changes to the word and recapitalize it */ for ( pWord = pTemplate_io->pNewWords ; pWord ; pWord = pWord->pNext ) { /* * apply output orthography changes and recapitalize */ p = applyChanges(pWord->pszString, pTextCtl_in->pOutputChanges ); recapitalizeWord( p, pTemplate_io->iCapital, pTextCtl_in); /* * store the modified wordform */ freeMemory(pWord->pszString); pWord->pszString = p; } } Source File ----------- `textout.c' removeDataFromTrie ================== Syntax ------ #include "trie.h" /* or opaclib.h */ int removeDataFromTrie(Trie * pTrieHead_in, char * pszKey_in, void * pInfo_in, void * (* pfRemoveInfo_in)(void * pOld_in, void * pList_io)); Description ----------- `removeDataFromTrie' removes a stored piece of information from a trie. The arguments to `removeDataFromTrie' are as follows: `pTrieHead_in' points to the head of a trie. `pszKey_in' points to the key string. `pInfo_in' points to the actual data element to remove. `pfRemoveInfo_in' points to a function for removing the data element from the stored information. The function has two arguments: `pOld_in' points to the item to remove from the collection (`pInfo_in'). `pList_io' points to a collection of items stored at a `Trie' node (`Trieinfo'). The function returns the updated pointer to the data collection for storing as the value of `pTrieInfo'. Return Value ------------ zero if successful, nonzero if an error occurs Example ------- #include #include "trie.h" #include "rpterror.h" #include "allocmem.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next item */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; unsigned long uiLexiconCount_g; static char szWhitespace_m[7] = " \t\r\n\f\v"; ... static void * remove_lex_item(void * pDefunct_in, void * pList_in) { LexItem * pLex; LexItem * pList; /* * be a little paranoid */ if (pDefunct_in == NULL) return pList_in; /* * handle removing the head of the list */ if (pDefunct_in == pList_in) return pDefunct_in->pLink; /* * unlink from the list of homographs */ /* * unlink from both the general list and the list of homographs */ for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink ) { if (pLex->pNext == pDefunct_in) pLex->pNext = pDefunct_in->pNext; if (pLex->pLink == pDefunct_in) { pLex->pLink = pDefunct_in->pLink; break; /* no need to check further */ } } return pList_in; } void remove_from_lexicon(char * pszForm_in, char * pszGloss_in, char * pszCategory_in) { LexItem * pLex; unsigned short uiCategory; if ( (pszForm_in == NULL) || (pszGloss_in == NULL) || (pszCategory_in == NULL) ) return; uiCategory = index_lexical_category(pszCategory_in); for ( pLex = findDataInTrie(pLexicon_g, pszWord_in) ; pLex ; pLex = pLex->pLink ) { if ( (strcmp(pLex->pszForm, pszWord_in) == 0) && (strcmp(pLex->pszGloss, pszGloss_in) == 0) && (pLex->uiCategory == uiCategory) ) { removeDataFromTrie(pLexicon_g, pszForm_in, pLex, remove_lex_item); break; } } } Source File ----------- `trie.c' removeFromStringList ==================== Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * removeFromStringList(StringList * pList_io, const char * pszString_in); Description ----------- `removeFromStringList' removes the first occcurrence of a string from a list of strings. The arguments to `removeFromStringList' are as follows: `pList_io' points to a list of strings. `pszString_in' points to the string to be removed. Return Value ------------ a pointer to the (possibly shorter) list, or `NULL' if the only item in the list was removed Example ------- #include "strlist.h" ... static StringList * pNameList_m; ... char * pszName; ... pNameList_m = removeFromStringList(pNameList_m, pszName); ... Source File ----------- `rmstr_sl.c' reportError =========== Syntax ------ #include "rpterror.h" /* or opaclib.h */ void reportError(int eMessageType_in, const char * pszFormat_in, ...); Description ----------- `reportError' reports an error message to the user. For MS-DOS and Unix, `reportError' writes to the standard error output. The message is also written to the standard output if it has been redirected. For GUI programs, the programmer must write a different version of `reportError' to satisfy the link requirements of other functions in the OPAC library. This would typically display a message box. The arguments to `reportError' are as follows: `eMessageType_in' is the type of error message being reported, one of the following: `ERROR_MSG' is a message about an erroneous situation. `WARNING_MSG' is a message about a situation that is not quite an error, but not normal either. `DEBUG_MSG' is a message that only the programmer is expected to understand. `pszFormat_in' points to a `printf' style format string for the (error) message. `...' represents zero or more arguments for the format string (`pszFormat_in'). Return Value ------------ none Example ------- See the example for `addDataToTrie' above. Source File ----------- `rpterror.c' reportMessage ============= Syntax ------ #include "rpterror.h" /* or opaclib.h */ void reportMessage(int bNotSilent_in, const char * pszFormat_in, ...); Description ----------- `reportMessage' displays a message with zero or more arguments. For MS-DOS and Unix, `reportMessage' writes to the standard error output. The message is also written to the standard output if it has been redirected. For GUI programs, the programmer must write a different version of `reportMessage' to satisfy the link requirements of other functions in the OPAC library. This would typically write to a message window. The arguments to `reportMessage' are as follows: `bNotSilent_in' allows writing the message to the standard error output if `TRUE' (nonzero). If `FALSE' (zero), the message is written only to the standard output (`stdout'), and then only if it has been redirected. This allows programs to have a "quiet" mode of operation without requiring a global variable. `pszFormat_in' points to a `printf' style format string for the message. `...' represents zero or more arguments for the format string (`pszFormat_in'). Return Value ------------ none Example ------- #include "rpterror.h" ... static int iDebugLevel_m; ... static int read_token(pszBuffer_in, uiBufferSize_in) char * pszBuffer_in; unsigned uiBufferSize_in; { int iTokenType; ... if (iDebugLevel_m >= 8) { reportMessage("DEBUG read_token(\"%s\",%u) => ", pszBuffer_in, uiBufferSize_in); switch (iTokenType) { case BECOMES: reportMessage("BECOMES_TOKEN"); break; case KEYWORD: reportMessage("KEYWORD_TOKEN"); break; case SYMBOL: reportMessage("SYMBOL_TOKEN"); break; default: reportMessage("'%c'\t", iTokenType); break; } reportMessage("\n"); } return( iTokenType ); } Source File ----------- `rptmessg.c' reportProgress ============== Syntax ------ #include "opaclib.h" void reportProgress(unsigned long uiCount_in); Description ----------- `reportProgress' displays a progress report based on a progress counter. The standard version of `reportProgress' actually does nothing. For GUI programs, the programmer may write a version of `reportProgress' to display some sort of progress message using the progress counter. `reportProgress' has one argument: `uiCount_in' is a progress count of some sort. Return Value ------------ none Example ------- #include "opaclib.h" ... static unsigned long uiTokenCount_m; ... static int read_token(pszBuffer_in, uiBufferSize_in) char * pszBuffer_in; unsigned uiBufferSize_in; { int iTokenType; ... ++uiTokenCount_m; reportProgress( uiTokenCount_m ); return( iTokenType ); } Source File ----------- `rptprgrs.c' resetTextControl ================ Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ void resetTextControl(TextControl * pTextCtl_io); Description ----------- `resetTextControl' frees any memory allocated by either `loadIntxCtlFile' or `loadOutxCtlFile'. It does not free the `TextControl' data structure itself. `resetTextControl' has one argument: `pTextCtl_io' points to a data structure that contains orthographic information. Return Value ------------ none Example ------- #include #include "textctl.h" /* include strclass.h */ #include "rpterror.h" ... char szIntxFilename_g[200]; TextControl sTextControl_g; StringClass * pStringClasses_g = NULL; static TextControl sDefaultTextControl_m = { NULL, /* filename */ NULL, /* ordered array of lowercase letters */ NULL, /* ordered array of matching uppercase letters */ NULL, /* array of caseless letters */ NULL, /* list of input orthography changes */ NULL, /* list of output (orthography) changes */ NULL, /* list of format markers (fields) to include */ NULL, /* list of format markers (fields) to exclude */ '\\', /* initial character of format markers (field codes) */ '%', /* character for marking ambiguities and failures */ '-', /* character for marking decomposition */ '|', /* initial character of secondary format markers */ NULL, /* (Manuscripter) bar codes */ TRUE, /* flag whether to capitalize individual letters */ TRUE, /* flag whether to decapitalize/recapitalize */ 100 /* maximum number of decapitalization alternatives */ }; ... memcpy(&sTextControl_g, &sDefaultTextControl_m, sizeof(TextControl)); fprintf(stderr, "Text Control File (xxINTX.CTL) [none]: "); fgets( szIntxFilename_g, 200, stdin ); if (szIntxFilename_g[0]) { if (loadIntxCtlFile(szIntxFilename_g, ';', sTextControl_g, pStringClasses_g) != 0) { reportError(ERROR_MSG, "Error reading text control file %s\n", szIntxFilename_g); } } if ( (sTextControl_g.cBarMark == NUL) && (sTextControl_g.pszBarCodes != NULL) ) { freeMemory(sTextControl_g.pszBarCodes); sTextControl_g.pszBarCodes = NULL; } if ( (sTextControl_g.cBarMark != NUL) && (sTextControl_g.pszBarCodes == NULL) ) { sTextControl_g.pszBarCodes = (unsigned char *)duplicateString( "bdefhijmrsuvyz"); } ... resetTextControl(&sTextControl_g); Source File ----------- `resetxtc.c' resetWordFormationChars ======================= Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ void resetWordFormationChars(TextControl * pTextCtl_io); Description ----------- `resetWordFormationChars' erases the stored information about word formation characters stored by previous calls to either `addWordFormationChars' or `addLowerUpperWFChars'. This frees any allocated memory and sets the relevant pointers to `NULL'. `resetWordFormationChars' has one argument: `pTextCtl_io' points to a data structure that contains orthographic information. Return Value ------------ none Example ------- See the example for `addLowerUpperWFChars' above. Source File ----------- `myctype.c' setAllocMemoryTracing ===================== Syntax ------ #include "allocmem.h" /* or opaclib.h */ void setAllocMemoryTracing(const char * pszFilename_in); Description ----------- `setAllocMemoryTracing' turns debugging on (if a filename is given) or off (if `pszFilename_in' is `NULL'). If debugging is on, every call to `allocMemory', `reallocMemory', and `freeMemory' is logged to the given file for postmortem analysis. Calls to `duplicateString' are logged as calls to `allocMemory', which `duplicateString' calls internally. `setAllocMemoryTracing' has one argument: `pszFilename_in' points to the name of the debugging output file, or is `NULL'. Return Value ------------ none Example ------- #include #include "allocmem.h" ... extern int getopt(int argc, char * const argv[], const char *opts); extern char * optarg; ... int main(int argc, char ** argv) { void * pTrapAddress = NULL; unsigned iTrapCount = 0; int k; char * p; ... while ((k = getopt(argc, argv, "ai:o:x:z:Z:")) != EOF) { switch (k) { ... case 'z': /* memory allocation trace filename */ setAllocMemoryTracing(optarg); break; case 'Z': /* memory allocation trap address,count */ pTrapAddress = (void *)strtoul(optarg, &p, 10); if (*p == ',') iTrapCount = (unsigned)strtoul(p+1, NULL, 10); if (iTrapCount == 0) iTrapCount = 1; setAllocMemoryTrap(pTrapAddress, iTrapCount); break; ... } } ... } Source File ----------- `allocmem.c' setAllocMemoryTrap ================== Syntax ------ #include "allocmem.h" /* or opaclib.h */ void setAllocMemoryTrap(const void * pAddress_in, int iCount_in); Description ----------- `setAllocMemoryTrap' sets a trap for the `iCount_in''th reference to the address `pAddress_in' by either `allocMemory' or `freeMemory'. This can be useful for tracking down memory allocation bugs. The arguments to `setAllocMemoryTrap' are as follows: `pAddress_in' is the memory address to trap on. `iCount_in' is the occurrence to trap on. Return Value ------------ none Example ------- See the example for `setAllocMemoryTracing' above. Source File ----------- `allocmem.c' showAmbiguousProgress ===================== Syntax ------ #include "opaclib.h" unsigned long showAmbiguousProgress(unsigned uiAmbiguityCount_in, unsigned long uiItemCount_in); Description ----------- `showAmbiguousProgress' displays the progress of the program in a rudimentary fashion. If `uiAmbiguityCount_in' is 0, then a star (`*') is written to the screen, and if `uiAmbiguityCount_in' is 1, then a dot (`.') is written to the screen. Otherwise, if `uiAmbiguityCount_in' is less than 10, the count digit is written, and if it is greater than or equal to 10, a greater than sign (`>') is written. These progress characters are grouped in bunches of 10, with 5 bunches on a line and space between each bunch. Every other line ends with the total count of items thus far (`uiItemCount_in'). The arguments to `showAmbiguousProgress' are as follows: `uiAmbiguityCount_in' is the number of alternative results to report for the current item. `uiItemCount_in' is the number of items that have been processed thus far. Return Value ------------ the updated value for uiItemCount_in Example ------- See the example for `freeWordTemplate' above. Source File ----------- `ambprog.c' squeezeStringList ================= Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * squeezeStringList(StringList * pList_io); Description ----------- `squeezeStringList' removes any redundant strings from a list of strings. `squeezeStringList' has one argument: `pList_io' points to a list of strings. Return Value ------------ a pointer to the (possibly smaller) list of strings Example ------- #include "template.h" /* includes strlist.h */ ... static WordTemplate * pTemplate_m = NULL; ... /* * eliminate identical results */ pTemplate_m->pNewWords = squeezeStringList( pTemplate_m->pNewWords ); Source File ----------- `sqz_sl.c' tokenizeString ============== Syntax ------ #include "opaclib.h" unsigned char * tokenizeString(unsigned char * pszString_in, const unsigned char * pszSeparate_in) Description ----------- `tokenizeString' splits the string (`pszString_in' into a sequence of zero or more text tokens separated by spans of one or more characters from `pszSeparate_in'. Only the initial call provides a value for `pszString_in'; successive calls must use a `NULL' pointer for the first argument. The first separater character following the token in `pszString_in' is replaced by a `NUL' character. Subsequent calls to `tokenizeString' work through `pszString_in' sequentially. Note that `pszSeparate_in' may change from one call to the next. `tokenizeString' is like `strtok' except that it operates on strings of `unsigned char' rather than strings of `char'. The arguments to `tokenizeString' are as follows: `pszString_in' points to a `NUL'-terminated character string, or `NULL'. `pszSeparate_in' points to a `NUL'-terminated set of separator characters, or `NULL'. If it is `NULL', then the rest of the string is returned as the token. Return Value ------------ a pointer to the next token extracted from the input string, or `NULL' if no more tokens exist Example ------- #include "opaclib.h" ... char szWhitespace_m[7] = " \n\r\t\f\v"; char szInputBuffer_m[1024]; char * pszToken; ... for ( pszToken = tokenizeString(szInputBuffer_m, szWhitespace_m) ; pszToken != NULL ; pszToken = tokenizeString(NULL, szWhitespace_m) ) { ... } ... Source File ----------- `tokenize.c' trimTrailingWhitespace ====================== Syntax ------ #include "opaclib.h" char * trimTrailingWhitespace(char * pszString_io); Description ----------- `trimTrailingWhitespace' removes any trailing white space characters from the input string. `trimTrailingWhitespace' has one argument: `pszString_io' points to a character string. Return Value ------------ a pointer to the beginning of the input string Example ------- #include "opaclib.h" ... static char szWhitespace_m[7] = " \t\r\n\f\v"; ... FILE * pRulesFP; unsigned uiLineNumber; char * pszToken; ... for ( uiLineNumber = 1 ;;) { pszToken = readLineFromFile(pRulesFP, &uiLineNumber, ';'); if (pszToken == NULL) break; /* * skip leading spaces and remove trailing spaces */ pszToken += strspn(pszToken, szWhitespace_m); if (*pszToken == NUL) continue; trimTrailingWhitespace(pszToken); ... } Source File ----------- `trimspac.c' unlinkStringList ================ Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ void unlinkStringList(StringList ** ppList_io); Description ----------- `unlinkStringList' frees the `StringList' data structures in a list of strings, while leaving intact the strings they point to. The arguments to `unlinkStringList' are as follows: `ppList_io' is the address of a pointer to the head of a list of strings to unlink. Return Value ------------ none Example ------- #include "strlist.h" ... StringList * pList; ... unlinkStringList(pList); pList = NULL; Source File ----------- `unlst_sl.c' updateStringList ================ Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ char * updateStringList(StringList ** ppList_io, const char * pszString_in); Description ----------- `updateStringList' adds the string to the list if it is not already in the list. This function is similar to `mergeIntoStringList', except that it has a different argument and returns a different value. The arguments to `updateStringList' are as follows: `ppList_io' is the address of a pointer to the list of strings to be updated. `pszString_in' points to the string to be added to the list of strings. Return Value ------------ a pointer to the copy of `pszString_in' stored in the list of strings Example ------- #include "strlist.h" ... static StringList * pCategories_m; static char szBuffer_m[100]; ... char * pszCategory; ... pszCategory = updateStringList( &pCategories_m, szBuffer_m ); ... Source File ----------- `updat_sl.c' walkTrie ======== Syntax ------ #include "trie.h" /* or opaclib.h */ void walkTrie(Trie * pTrieHead_in, void (* pfWalk_in)(void * pList_in)); Description ----------- `walkTrie' walks through a trie, processing the information stored at each node. The arguments to `walkTrie' are as follows: `pTrieHead_in' points to the head of a trie. `pfWalk_in' points to a function for processing the stored information at each node of the trie. The function has one argument: `pList_in' points to a collection of items stored at a `Trie' node (`Trieinfo'). The function does not return a value. Return Value ------------ none Example ------- #include #include "trie.h" #include "rpterror.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next item */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; FILE * pLexiconFP_m; ... static void write_lex_items(void * pList_in) { LexItem * pLex; if (pLexiconFP_m == NULL) return; for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink ) { fprintf(pLexiconFP_m, "%-20s %-20s %s\n", pLex->pszForm, pLex->pszGloss, get_lexical_category_name(pLex->uiCategory)); } } void write_lexicon() { if (pszLexiconFile_in == NULL) { reportError(WARNING_MSG, "Missing output lexicon filename\n"); return; } pLexiconFP_m = fopen(pszLexiconFile_in, "w"); if (pLexiconFP_m == NULL) { reportError(WARNING_MSG, "Cannot open lexicon file %s for output\n", pszLexiconFile_in); return; } walkTrie(pLexicon_g, write_lex_items); fclose(pLexiconFP_m); } Source File ----------- `trie.c' writeAllocMemoryDebugMsg ======================== Syntax ------ #include "allocmem.h" /* or opaclib.h */ void writeAllocMemoryDebugMsg(const char * pszFormat_in, ...); Description ----------- `writeAllocMemoryDebugMsg' writes a message to the memory allocation tracing file if it is open, and does nothing if that file is not open. The memory allocation tracing file is opened and closed by `setAllocMemoryTracing'. `writeAllocMemoryDebugMsg' is similar to `printf' except that it writes to a specific (optional) file rather than to the standard output. The arguments to `writeAllocMemoryDebugMsg' are as follows: `pszFormat_in' points to a `printf' style format string for the message. `...' represents zero or more arguments for the format string (`pszFormat_in'). Return Value ------------ none Example ------- #include "allocmem.h" #include "strlist.h" ... StringList * pStrings; ... writeAllocMemoryDebugMsg("deleting %u strings\n", getStringListSize(pStrings)); freeStringList(pStrings); pStrings = NULL; Source File ----------- `allocmem.c' writeChange =========== Syntax ------ #include "change.h" void writeChange(const Change * pChange_in, FILE * pOutputFP_in); Description ----------- `writeChange' writes the given `Change' data structure to the output file as a human readable string consisting of a pair of quoted strings followed by the environment constraint (if any). The arguments to `writeChange' are as follows: `pChange_in' points to a single consistent change data structure. (The `pNext' field of the `Change' data structure is ignored.) `pOutputFP_in' is an output FILE pointer. Return Value ------------ none Example ------- #include #include "change.h" ... void writeChangeList(FILE * pOutputFP_in, Change * pChanges_in) { Change * cp; if (pOutputFP_in == NULL) return; for ( cp = pChanges_in ; cp ; cp = cp->pNext ) writeChange(cp, pOutputFP_in); } Source File ----------- `change.c' writeCodeTable ============== Syntax ------ #include "record.h" void writeCodeTable(FILE * pOutputFP_in, const CodeTable * pTable_in); Description ----------- `writeCodeTable' writes the contents of a `CodeTable' data structure to a file. The output is useful only for debugging. The arguments to `writeCodeTable' are as follows: `pOutputFP_in' is an output FILE pointer. `pTable_in' points to a `CodeTable' data structure. Return Value ------------ none Example ------- #include "record.h" #include "ample.h" AmpleData sAmpleData_g; char szCodesFilename_g[100]; ... loadAmpleDictCodeTables(szCodesFilename_g, &sAmpleData_g, FALSE); writeCodeTable( sAmpleData_g.pLogFP, sAmpleData_g.pPrefixTable ); Source File ----------- `loadtb.c' writeStringClasses ================== Syntax ------ #include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ void writeStringClasses(FILE * pOutputFP_in, const StringClass * pClasses_in); Description ----------- `writeStringClasses' writes the contents of all the string classes in the list to a file. The arguments to `writeStringClasses' are as follows: `pOutputFP_in' is an output FILE pointer. `pClasses_in' points to a list of string classes to write to a file. Return Value ------------ none Example ------- #include #include "strclass.h" ... static StringClass * pClasses_m; ... writeStringClasses(stdout, pClasses_m); ... } Source File ----------- `strcla.c' writeStringList =============== Syntax ------ #include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ void writeStringList(const StringList * pList_in, const char * pszSep_in, FILE * pOutputFP_in); Description ----------- `writeStringList' writes a list of strings to an output file, separating the individual strings in the list by the indicated string. The arguments to `writeStringList' are as follows: `pList_in' points to a list of strings. `pszSep_in' points to the string used to separate the members of the list. `pOutputFP_in' is an output FILE pointer. Return Value ------------ none Example ------- #include #include "strlist.h" ... static StringList * pCategories_m; ... void showCategories() { printf("Categories: "); writeStringList(pCategories_m, " ", stdout); printf("\n"); } Source File ----------- `write_sl.c' writeTemplate ============= Syntax ------ #include "template.h" /* or opaclib.h */ void writeTemplate(FILE * pOutputFP_in, const char * pszFilename_in, const WordTemplate * pTemplate_in, const TextControl * pTextCtl_in); Description ----------- `writeTemplate' writes the results of a morphological analysis as a database. Each word is a record with these fields: `\a' analysis (ambiguities and failures marked) `\d' morpheme decomposition (ambiguities and failures marked) `\cat' final category of word (ambiguities and failures marked) `\p' properties (ambiguities and failures marked) `\fd' feature descriptors (ambiguities and failures marked) `\u' underlying form (ambiguities and failures marked) `\w' original word `\f' preceding format marks `\c' capitalization `\n' trailing nonalphabetics Ambiguities are marked as `%n%Anal1%Anal2%...%analn%'. Failures are marked as `%0%OriginalWord%' or `%0%%'. (The separation character can be set to something other than `%'.) The arguments to `writeTemplate' are as follows: `pOutputFP_in' is an output FILE pointer. `pszFilename_in' points to the name of the output file. `pTemplate_in' points to a data structure that contains the word analysis information. `pTextCtl_in' points to a data structure that contains orthographic information, and also the ambiguity marker character. Return Value ------------ none Example ------- See the example for `freeWordTemplate' above. Source File ----------- `dtbout.c' writeTextFromTemplate ===================== Syntax ------ #include "template.h" /* or opaclib.h */ void writeTextFromTemplate(FILE * pOutputFP_in, const WordTemplate * pTemplate_in, const TextControl * pTextCtl_in); Description ----------- `writeTextFromTemplate' writes the results of a morphological synthesis to an output file, restoring all the formatting information associated with the word in the original input to analysis. Ambiguities are marked as `%n%Word1%Word2%...%Wordn%'. Failures are marked as `%0%OriginalWord%'. (The separation character can be set to something other than `%'.) The arguments to `writeTextFromTemplate' are as follows: `pOutputFP_in' is an output FILE pointer. `pTemplate_in' points to a data structure containing the word analysis and synthesis information. `pTextCtl_in' points to a data structure that contains orthographic information, and also the ambiguity marker character. Return Value ------------ none Example ------- See the example for `readTemplateFromAnalysis' above. Source File ----------- `textout.c' writeTrieData ============= Syntax ------ #include "trie.h" /* or opaclib.h */ void writeTrieData(Trie * pTrieHead_in, void (* pfWriteInfo_in)(void * pList_in, int iIndent_in, FILE * pOutputFP_in), FILE * pOutputFP_in); Description ----------- `writeTrieData' walks through a trie, writing the information stored at each node to a file. This is intended primarily for debugging, as the trie structure is explicitly written to the output file in indented form, together with the information stored in the trie. The arguments to `writeTrieData' are as follows: `pTrieHead_in' points to the head of a trie. `pfShowInfo_in' points to a function for writing the stored information to a file. The function has three arguments: `pList_in' points to a collection of items stored at a `Trie' node (`Trieinfo'). `iIndent_in' is the number of spaces to indent the display of each data item in the collection. `pOutputFP_in' is the output FILE pointer. The function does not return a value. `pOutputFP_in' is an output FILE pointer. Return Value ------------ none Example ------- #include #include "trie.h" #include "rpterror.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next item */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; ... static void debug_lex_items(void * pList_in, int iIndent_in, FILE * pOutputFP_in) { LexItem * pLex; int i; if (pOutputFP_in == NULL) return; for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink ) { for ( i = 0 ; i < iIndent_in ; ++i ) fputc(' ', pOutputFP_in); fprintf(pOutputFP_in, "%-20s %-20s %u [%lu -> %lu]\n", pLex->pszForm, pLex->pszGloss, pLex->uiCategory, (unsigned long)pLex, (unsigned long)pLex->pNext); } } void debug_lexicon() { printf("BEGIN LEXICON TRIE DATA\n"); writeTrieData(pLexicon_g, debug_lex_items, stdout); printf("END LEXICON TRIE DATA\n"); } Source File ----------- `trie.c' writeWordAnalysisList ===================== Syntax ------ #include "template.h" void writeWordAnalysisList(const WordAnalysis * pAnalyses_in, FILE * pOutputFP_in); Description ----------- `writeWordAnalysisList' writes a list of `WordAnalysis' data structures to an output file for debugging purposes. The arguments to `writeWordAnalysisList' are as follows: `pAnalyses_in' points to a list of `WordAnalysis' data structures. `pOutputFP_in' is an output FILE pointer. Return Value ------------ none Example ------- #include #include "template.h" ... void dumpWordTemplate(pTemplate_in, pOutputFP_in) WordTemplate * pTemplate_in; FILE * pOutputFP_in; { if (pOutputFP_in == NULL) return; if (pTemplate_in == NULL)) { fprintf(pOutputFP_in, "WordTemplate ptr is NULL\n"); return; } putc('\n', pOutputFP_in); fprintf(pOutputFP_in, " orig_word = \"%s\"\n", pTemplate_in->pszOrigWord ? pTemplate_in->pszOrigWord : "{NULL}" ); fprintf(pOutputFP_in, " word = \"%s\"\n", pTemplate_in->paWord && pTemplate_in->paWord[0] ? pTemplate_in->paWord[0] : "{NULL}" ); fprintf(pOutputFP_in, " format = \"%s\"\n", pTemplate_in->pszFormat ? pTemplate_in->pszFormat : "{NULL}" ); fprintf(pOutputFP_in, " non_alpha = \"%s\"\n", pTemplate_in->pszNonAlpha ? pTemplate_in->pszNonAlpha : "{NULL}" ); fprintf(pOutputFP_in, " capital = %d\n", pTemplate_in->iCapital ); writeWordAnalysisList(pTemplate_in->pAnalyses, pOutputFP_in); fprintf(pOutputFP_in, " new_words = "); if (pTemplate_in->pNewWords) { fprintf(pOutputFP_in, "\""); writeStringList( pTemplate_in->pNewWords, "\" \"", pOutputFP_in); fprintf(pOutputFP_in, "\"\n"); } else fprintf(pOutputFP_in, "{NULL}\n"); } Source File ----------- `wordanal.c' writeWordFormationChars ======================= Syntax ------ #include "textctl.h" /* or template.h or opaclib.h */ void writeWordFormationChars(FILE * pOutputFP_in, const TextControl * pTextCtl_in); Description ----------- `writeWordFormationChars' writes the set of word formation characters to an output file. This function depends on previous calls to `addWordFormationChars' and `addLowerUpperWFChars'. The arguments to `writeWordFormationChars' are as follows: `pOutputFP_in' is an output FILE pointer. `pTextCtl_in' points to a data structure that contains orthographic information. Return Value ------------ none Example ------- #include #include "textctl.h" ... static TextControl sTextCtl_m; ... printf("The word formation characters are:\n"); writeWordFormationChars(stdout, &sTextCtl_m); ... Source File ----------- `myctype.c' Bibliography ************ 1. Antworth, Evan L.. 1990. `PC-KIMMO: a two-level processor for morphological analysis'. Occasional Publications in Academic Computing No. 16. Dallas, TX: Summer Institute of Linguistics. 2. Kew, Jonathan and Stephen R. McConnel. 1991. `Formatting interlinear text'. Occasional Publications in Academic Computing No. 17. Dallas, TX: Summer Institute of Linguistics. 3. Knuth, Donald E.. 1973. `Sorting and Searching'. Volume 3 of `The Art of Computer Programming'. Reading, MA: Addison-Wesley. 4. Weber, David J., H. Andrew Black, and Stephen R. McConnel. 1988. `AMPLE: a tool for exploring morphology'. Occasional Publications in Academic Computing No. 12. Dallas, TX: Summer Institute of Linguistics. 5. Weber, David J., H. Andrew Black, Stephen R. McConnel, and Alan Buseman. 1990. `STAMP: a tool for dialect adaptation'. Occasional Publications in Academic Computing No. 15. Dallas, TX: Summer Institute of Linguistics. 6. Weber, David J., Stephen R. McConnel, Diana D. Weber and Beth J. Bryson. 1994. `PRIMER: a tool for developing early reading materials'. Occasional Publications in Academic Computing No. 18. Dallas, TX: Summer Institute of Linguistics. Table of Contents ***************** Introduction to the OPAC function library Variable and function naming conventions Preprocessor macro names Data structure names Variable names Function names Examples The OPAC function library data structures CaselessLetter Change ChangeEnvironment ChgEnvItem CodeTable LowerLetter NumberedMessage StringClass StringList TextControl Trie UpperLetter WordAnalysis WordTemplate The OPAC function library global variables pfOutOfMemory_g pRecordBuffer_g szOutOfMemoryMarker_g szRecordKey_g uiRecordBufferSize_g uiTrieArrayBlockSize_g The OPAC functions addDataToTrie addLowerUpperWFChars addLowerUpperWFCharStrings addStringClass addToStringList addWordFormationChars addWordFormationCharStrings allocMemory applyChanges buildAdjustedFilename buildChangeString checkFileError cleanupAfterStdFormatRecord convLowerToUpper convLowerToUpperSet convUpperToLower convUpperToLowerSet decapitalizeWord displayNumberedMessage duplicateString duplicateStringList equivalentStringLists eraseCharsInString eraseTrie exitSafely fcloseWithErrorCheck findDataInTrie findStringClass fitAllocStringExactly fixSynthesizedWord fopenAlways freeChangeList freeCodeTable freeMemory freeStringClasses freeStringList freeWordAnalysisList freeWordTemplate getAndClearAllocMemorySum getChangeQuote getStringListSize identicalStringLists isMemberOfStringList isolateWord isStringClassMember loadIntxCtlFile loadOutxCtlFile matchAlphaChar matchBeginning matchBeginWithStringClass matchCaselessChar matchEnd matchEndWithStringClass matchLowercaseChar matchUppercaseChar mergeIntoStringList mergeIntoStringListAtEnd mergeTwoStringLists parseChangeString promptUser readLineFromFile readSentenceOfTemplates readStdFormatField readStdFormatRecord readTemplateFromAnalysis readTemplateFromText readTemplateFromTextString reallocMemory recapitalizeWord removeDataFromTrie removeFromStringList reportError reportMessage reportProgress resetTextControl resetWordFormationChars setAllocMemoryTracing setAllocMemoryTrap showAmbiguousProgress squeezeStringList tokenizeString trimTrailingWhitespace unlinkStringList updateStringList walkTrie writeAllocMemoryDebugMsg writeChange writeCodeTable writeStringClasses writeStringList writeTemplate writeTextFromTemplate writeTrieData writeWordAnalysisList writeWordFormationChars Bibliography