This document describes a library of data structures and functions developed over the years for programs in the Occasional Publications in Academic Computing series published by SIL International. (For SIL International, "academic" refers to linguistics, literacy, anthropology, translation, and related fields.) It is hoped that this documentation will make future maintenance of these programs easier.
The basic goal behind choosing names in the OPAC function library is for the name to convey information about what it represents. This is achieved in two ways: striving for a descriptive name rather than a short cryptic abbreviated name, and following a different pattern of capitalization for each type of name.
Preprocessor macro names are written entirely in capital letters. If
the name requires more than one word for an adequate description, the
words are joined together with intervening underscore (_
)
characters.
Data structure names consist of one or more capitalized words. If the name requires more than one word for an adequate description, the words are joined together without underscores, depending on the capitalization pattern to make them readable as separate words.
Variable names in the OPAC function library follow a modified form of the Hungarian naming convention described by Steve McConnell in his book Code Complete on pages 202-206.
Variable names have three parts: a lowercase type prefix, a descriptive name, and a scope suffix.
The type prefix has the following basic possibilities:
b
char
, short
, or int
c
char
but sometimes a short
or
int
d
double
e
enum
or as a char
,
short
, or int
i
int
, short
, long
, or
(rarely) char
s
struct
statement
sz
pf
In addition, the basic types may be prefixed by these qualifiers:
u
a
p
The descriptive name portion of a variable name consists of one or more
capitalized words concatenated together. There are no underscores
(_
) separating these words from each other, or from the type
prefix. For the OPAC function library, the descriptive
name for global variables
may begin with the name of the most relevant data strucure, if any.
The scope suffix has these possibilities:
_g
_m
static
)
_in
_out
_io
_s
static
)
The lack of a scope suffix indicates that a variable is declared within a function and exists on the stack for the duration of the current call.
Global function names in the OPAC function library have
two parts: a verb that is all lowercase followed by a noun phrase
containing one or more capitalized words. These pieces are
concatanated without any intervening underscores (_
). For the
OPAC library functions, the noun phrase section
includes
the name of the most relevant data strucure, if any.
Given the discussion above, it is easy to discern at a glance what type of item each of the following names refers to.
SAMPLE_NAME
SampleName
pSampleName
writeSampleName
SampleName
).
This chapter describes the data structures defined for the OPAC function library. These include both general purpose data collection structures and specialized linguistic processing data structures. For each data structure that the library provides, this information includes which header files to include in your source to obtain its definition.
#include "textctl.h" /* or template.h or opaclib.h */ typedef struct caseless_letter { unsigned char * pszLetter; struct caseless_letter * pNext; } CaselessLetter;
The CaselessLetter
data structure is normally used only inside a
TextControl
data structure. It stores a multibyte character
string that represents a single caseless letter.
The fields of the CaselessLetter
data structure are as follows:
pszLetter
pNext
`textctl.h'
#include "change.h" /* or textctl.h or template.h or opaclib.h */ typedef struct change_list { char * pszMatch; char * pszReplace; ChangeEnvironment * pEnvironment; char * pszDescription; struct change_list * pNext; } Change;
A Change
data structure stores a single "consistent change" to
apply to character strings. Such consistent changes are usually used
as ordered lists of changes rather than being applied in isolation here
and there.
The fields of the Change
data structure are as follows:
pszMatch
pszReplace
pEnvironment
pszDescription
pNext
`change.h'
#include "change.h" /* or textctl.h or template.h or opaclib.h */ typedef struct chg_envir { short bNot; ChgEnvItem * pLeftEnv; ChgEnvItem * pRightEnv; struct chg_envir * pNext; } ChangeEnvironment;
The ChangeEnvironment
data structure is normally used only
inside a Change
data structure.
The fields of the ChangeEnvironment
data structure are as follows:
bNot
pLeftEnv
pRightEnv
pNext
`change.h'
#include "change.h" /* or textctl.h or template.h or opaclib.h */ typedef struct chg_env_item { char iFlags; union { char * pszString; StringClass * pClass; } u; struct chg_env_item * pNext; } ChgEnvItem;
The ChgEnvItem
data structure is normally used only inside a
ChangeEnvironment
data structure, which is normally used only
inside a Change
data structure.
The fields of the ChgEnvItem
data structure are as follows:
iFlags & E_NOT
iFlags & E_CLASS
iFlags & E_ELLIPSIS
iFlags & E_OPTIONAL
u.pszString
iFlags & E_CLASS
is 0
.
u.pClass
StringClass
data structure if iFlags & E_CLASS
is not 0
.
See section 3.8 StringClass.
pNext
`change.h'
#include "record.h" /* or opaclib.h */ typedef struct { char * pCodeTable; unsigned uiCodeCount; char * pszFirstCode; } CodeTable;
The CodeTable
data structure is used to map between the field
codes used in a standard format file and single characters used in
case
labels inside switch
statements in C code.
The fields of the CodeTable
data structure are as follows:
pCodeTable
"match1\0A\0match2\0B\0"
. Note that the replacement strings
are assumed to be single characters.
uiCodeCount
pCodeTable
.
pszFirstCode
pCodeTable
.
`record.h'
#include "textctl.h" /* or template.h or opaclib.h */ typedef struct lower_letter { unsigned char * pszLower; StringList * pUpperList; struct lower_letter * pNext; } LowerLetter;
The LowerLetter
data structure is normally used only inside a
TextControl
data structure. It stores a multibyte character
string that represents a single lowercase letter. It also stores a list
of the corresponding uppercase multigraph character strings.
The fields of the NumberedMessage
data structure are as follows:
pszLower
pUpperList
pNext
`textctl.h'
#include "rpterror.h" /* or opaclib.h */ typedef struct { int eType; unsigned uiNumber; char * pszMessage; } NumberedMessage;
The NumberedMessage
data structure stores the information for a
single numbered error or warning message. This is the style of error
reporting used by the PC-Kimmo and PC-PATR programs.
The fields of the NumberedMessage
data structure are as follows:
eType
ERROR_MSG
WARNING_MSG
DEBUG_MSG
uiNumber
pszMessage
printf
style format string for the message.
`rpterror.h'
#include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ typedef struct string_class { char * pszName; StringList * pMembers; struct string_class * pNext; } StringClass;
The StringClass
data structure stores a labeled set of strings.
The intention is that any one of the set of strings may be used in a
matching operation.
The fields of the StringClass
data structure are as follows:
pszName
pMembers
pNext
`strclass.h'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ typedef struct strlist { char * pszString; struct strlist * pNext; } StringList;
The StringList
data structure is used to store a collection of
character strings. This collection may be a set (no duplicate
strings), an ordered list, or an unordered list, depending on how the
programmer adds strings to the list.
The fields of the StringList
data structure are as follows:
pszString
pNext
This is one of the most commonly used data structures in the OPAC function library.
3.9.3 Source File `strlist.h'
#include "textctl.h" /* or template.h or opaclib.h */ typedef struct text_control { char * pszTextControlFile; LowerLetter * pLowercaseLetters; UpperLetter * pUppercaseLetters; CaselessLetter * pCaselessLetters; Change * pOrthoChanges; Change * pOutputChanges; StringList * pIncludeFields; StringList * pExcludeFields; unsigned char cFormatMark; unsigned char cAmbig; unsigned char cDecomp; unsigned char cBarMark; unsigned char * pszBarCodes; char bIndividualCapitalize; char bCapitalize; unsigned uiMaxAmbigDecap; } TextControl;
The TextControl
data structure is used to control reading a text
file into a (sequence of) WordTemplate
data structure(s), or
writing a (sequence of) WordTemplate
data structure(s) to a text
file.
The fields of the TextControl
data structure are as follows:
pszTextControlFile
pLowercaseLetters
pUppercaseLetters
pCaselessLetters
pOrthoChanges
pOutputChanges
pIncludeFields
pExcludeFields
cFormatMark
cAmbig
cDecomp
cBarMark
pszBarCodes
bIndividualCapitalize
bCapitalize
uiMaxAmbigDecap
`textctl.h'
#include "trie.h" /* or opaclib.h */ typedef struct s__trienode { unsigned char cLetter; struct s__trienode * pChildren; struct s__trienode * pSiblings; void * pTrieInfo; } Trie;
A trie is a data structure designed for relatively fast insertion and relatively fast retrieval of information referenced by a "key" string. See Knuth 1973, pages 481-505, for an extended treatment of tries.
The fields of the Trie
data structure are as follows:
cLetter
pChildren
cLetter
in
their key at this point.
pSiblings
cLetter
in their key at this point.
pTrieInfo
`trie.h'
#include "textctl.h" /* or template.h or opaclib.h */ typedef struct upper_letter { unsigned char * pszUpper; StringList * pLowerList; struct upper_letter * pNext; } UpperLetter;
The UpperLetter
data structure is normally used only inside a
TextControl
data structure. It stores a multibyte character
string that represents a single uppercase letter. It also stores a list
of the corresponding lowercase multigraph character strings.
The fields of the NumberedMessage
data structure are as follows:
pszUpper
pLowerList
pNext
Application programmers should not need to use this data structure
directly, as its only use is for a list embedded in the
TextControl
data structure.
3.12.3 Source File `textctl.h'
#include "template.h" /* or opaclib.h */ typedef struct word_analysis { char * pszAnalysis; char * pszDecomposition; char * pszCategory; char * pszProperties; char * pszFeatures; char * pszUnderlyingForm; char * pszSurfaceForm; struct word_analysis * pNext; } WordAnalysis;
The WordAnalysis
data structure is normally used a part of a
WordTemplate
data structure to record the result of
morphological analysis.
The fields of the WordAnalysis
data structure are as follows:
pszAnalysis
pszDecomposition
cDecomp
field of a TextControl
data structure.
pszCategory
=
).
pszProperties
=
).
pszFeatures
=
).
pszUnderlyingForm
cDecomp
field of a TextControl
data
structure.
pszSurfaceForm
pNext
`template.h'
typedef struct { char * pszFormat; char * pszOrigWord; char ** paWord; char * pszNonAlpha; short iCapital; short iOutputFlags; WordAnalysis * pAnalyses; StringList * pNewWords; } WordTemplate;
The WordTemplate
data structure is used to hold a single word
for processing, with the original capitalization and punctuation
preserved for restoration on output.
The fields of the WordTemplate
data structure are as follows:
pszFormat
pszOrigWord
paWord
NULL
-terminated array of alternative surface forms
after decapitalization and orthography changes.
pszNonAlpha
iCapital
NOCAP
INITCAP
ALLCAP
4-65535
4
is the first letter being capitalized, 8
is the second letter being capitalized, and so on. This scheme handles
only the first 14 characters of the word.
iOutputFlags & WANT_DECOMPOSITION
pAnalyses->pszDecomposition
) to
be written to an output file if set (nonzero).
iOutputFlags & WANT_CATEGORY
pAnalyses->pszCategory
) to be
written to an output file if set.
iOutputFlags & WANT_PROPERTIES
pAnalyses->pszProperties
) to be
written to an output file if set.
iOutputFlags & WANT_FEATURES
pAnalyses->pszFeatures
) to
be written to an output file if set.
iOutputFlags & WANT_UNDERLYING
pAnalyses->pszUnderlyingForm
)
to be written to an output file if set.
iOutputFlags & WANT_ORIGINAL
pszOrigWord
) to be written to an
output file if set.
pAnalyses
pNewWords
`template.h'
This chapter gives the proper usage information about each of the global variables found in the OPAC function library. For each global variable that the library provides, this information includes which header files to include in your source to obtain the extern declaration for that variable.
#include "allocmem.h" /* or opaclib.h */ extern void (* pfOutOfMemory_g)(size_t uiSize_in);
pfOutOfMemory_g
points to a function used by allocMemory
and related functions whenever malloc
or realloc
return a
NULL
. This function has one argument, the size of the
allocation request that failed. It is assumed that this function does
not return normally, so that programs that use allocMemory
do
not need to check for a successful memory allocation. This can be
satisfied either by aborting the program or by judicious use of
setjmp
and longjmp
.
The default value for pfOutOfMemory_g
is NULL
.
This causes a function to be used which simply displays an error
message (using szOutOfMemoryMarker_g
) and aborts the program.
4.1.3 Example
#include <stdio.h> #include <setjmp.h> #include "allocmem.h" ... static jmp_buf jmpNoMemory_m; static void out_of_memory(uiRequest_in) size_t uiRequest_in; { fprintf(stderr, "Out of memory requesting %lu bytes---trying to recover", (unsigned long)uiRequest_in); longjmp( jmpNoMemory_m, 1 ); } char * processData() { char * p; if (setjmp( jmpNoMemory_m )) { /* free any memory left hanging in mid air */ ... return NULL; } pfOutOfMemory_g = out_of_memory; p = processSafely(); pfOutOfMemory_g = NULL; /* restore default behavior */ return p; }
`allocmem.c'
#include "record.h" /* or opaclib.h */ extern char * pRecordBuffer_g;
pRecordBuffer_g
points to the dynamically allocated buffer used
by readStdFormatRecord
for its return value. Allocating this
buffer is handled automatically (but perhaps not optimally) if the
programmer does not allocate it explicitly.
4.2.3 Example
#include "record.h" #include "allocmem.h" #define BIG_RECSIZE 16000 #define SMALL_RECSIZE 500 ... /* * allocate space for records */ pRecordBuffer_g = (char *)allocMemory( BIG_RECSIZE ); uiRecordBufferSize_g = BIG_RECSIZE; ... /* * reduce amount of memory allocated for records */ freeMemory( pRecordBuffer_g ); pRecordBuffer_g = (char *)allocMemory( SMALL_RECSIZE ); uiRecordBufferSize_g = SMALL_RECSIZE; ... /* * release memory allocated for records */ cleanupAfterStdFormatRecord();
`record.c'
#include "allocmem.h" /* or opaclib.h */ extern char szOutOfMemoryMarker_g[/*101*/];
szOutOfMemoryMarker_g
is a character array used by
allocMemory
and friends whenever malloc
or realloc
return a NULL
and pfOutOfMemory_g
is NULL
. The
contents of the character array are used as part of the error message
notifying the user that a request for more memory has failed.
The default value for szOutOfMemoryMarker_g
is to be empty (all
NUL
bytes). This means that no context sensitive information is
provided in the error message displayed just before the program aborts.
4.3.3 Example
#include "allocmem.h" ... int * piArray; ... strncpy(szOutOfMemoryMarker_g, "creating huge array", 100); piArray = allocMemory( 100000 * sizeof(int) );
`allocmem.c'
#include "record.h" /* or opaclib.h */ /*#define MAX_RECKEY_SIZE 64*/ extern char szRecordKey_g[MAX_RECKEY_SIZE];
readStdFormatRecord
stores the first MAX_RECKEY_SIZE-1
characters following the record marker in szRecordKey_g
. This
may or may not be useful information.
4.4.3 Example
#include <stdio.h> #include "record.h" #include "rpterror.h" ... void load_dictionary( char * pszInputFile_in, CodeTable * pCodeTable_in, int cComment_in) { FILE * pInputFP; char * pRecord; char * pszField; char * pszNextField; unsigned uiRecordCount = 0; pInputFP = fopen(pszInputFile_in, "r"); if (pInputFP == NULL) { reportError(WARNING_MSG, "Cannot open dictionary file %s\n", pszInputFile_in); return; } while ((pRecord = readStdFormatRecord(pInputFP, pCodeTable_in, cComment_in, &uiRecordCount)) != NULL) { pszField = pRecord; while (*pszField) { pszNextField = pszField + strlen(pszField) + 1; switch (*pszField) { case 'A': ... break; case 'B': ... break; ... default: reportError(WARNING_MSG, "Warning: unrecognized field in record %u (%s)\n%s\n", uiRecordCount, szRecordKey_in, pszField); break; } ... pszField = pszNextField; } ... } cleanupAfterStdFormatRecord(); fclose(pInputFP); ... }
`record.c'
#include "record.h" /* or opaclib.h */ extern unsigned uiRecordBufferSize_g;
uiRecordBufferSize_g
stores the number of bytes allocated for
pRecordBuffer_g
.
4.5.3 Example See section 4.2 pRecordBuffer_g.
4.5.4 Source File `record.c'
#include "trie.h" /* or opaclib.h */ extern size_t uiTrieArrayBlockSize_g;
Trie
nodes are allocated uiTrieArrayBlockSize_g
nodes at
a time for efficiency.
The default value for uiTrieArrayBlockSize_g
is 2000, which
minimizes the number of calls to allocateMemory
, but potentially
wastes several thousand bytes of memory.
4.6.3 Example
#include "strlist.h" #include "trie.h" ... Trie * pLexicon = NULL; StringList * pNewString; ... VOIDP addStringToList(VOIDP pNew_in, VOIDP pList_in) { StringList * pList = pList_in; StringList * pNew = pNew_in; pNew->pNext = pList; return pNew; } ... uiTrieArrayBlockSize_g = 63; /* less time efficient, but more space efficient */ ... pNewString = mergeIntoStringList(NULL, "Test value"); pLexicon = addDataToTrie(pLexicon, pNewString->pszString, pNewString, addStringToList, 3);
`trie.c'
This chapter gives the proper usage information about each of the functions found in the OPAC function library. For each function that the library provides, this information includes which header files to include in your source to obtain prototypes and type definitions relevent to the use of that function.
#include "trie.h" /* or opaclib.h */ Trie * addDataToTrie(Trie * pTrieHead_io, const char * pszKey_in, void * pInfo_in, void * (* pfLinkInfo_in)(void * pNew_in, void * pList_io), int iMaxTrieDepth_in);
addDataToTrie
adds information to a trie, using the given
insertion key.
The arguments to addDataToTrie
are as follows:
pTrieHead_io
NULL
the first time
addDataToTrie
is called. Each subsequent call should use the
value returned by the preceding call.
pszKey_in
pInfo_in
Trie
for data storage and retrieval.
pfLinkInfo_in
pTrieInfo
field of the leaf Trie
data structure found or created for this
key. The function has two arguments:
pNew_in
pList_io
Trie
node
(Trieinfo
), or is NULL
.
pTrieInfo
.
iMaxTrieDepth_in
a pointer to the head of the modified trie
5.1.4 Example
#include <stdio.h> #include <string.h> #include "trie.h" #include "rpterror.h" #include "allocmem.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next item */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; unsigned long uiLexiconCount_g; static char szWhitespace_m[7] = " \t\r\n\f\v"; ... static void * add_lex_item(void * pNew_in, void * pList_in) { LexItem * pLex; /* * be a little paranoid */ if (pNew_in == NULL) return pList_in; /* * link the list of items that start out the same */ ((LexItem *)pNew_in)->pLink = (LexItem *)pList_in; /* * link the list of homographs */ for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink ) { if (strcmp(((LexItem *)pNew_in)->pszForm, pLex->pszForm) == 0) { ((LexItem *)pNew_in)->pNext = pLex; break; } } return pNew_in; } void load_lexicon(char * pszLexiconFile_in) { FILE * pLexiconFP; char szBuffer[512]; char * pszForm; char * pszGloss; char * pszCategory; LexItem * pLexItem; if (pszLexiconFile_in == NULL) { reportError(WARNING_MSG, "Missing input lexicon filename\n"); return; } pLexiconFP = fopen(pszLexiconFile_in, "r"); if (pLexiconFP == NULL) { reportError(WARNING_MSG, "Cannot open lexicon file %s for input\n", pszLexiconFile_in); return; } while (fgets(szBuffer, 512, pLexiconFP) != NULL) { pszForm = strtok(szBuffer, szWhitespace_m); pszGloss = strtok(NULL, szWhitespace_m); pszCategory = strtok(NULL, szWhitespace_m); if ( (pszForm == NULL) || (pszGloss == NULL) || (pszCategory == NULL) ) continue; pLexItem = (LexItem *)allocateMemory((unsigned)sizeof(LexItem)); pLexItem->pLink = NULL; pLexItem->pNext = NULL; pLexItem->pszForm = duplicateString(pszForm); pLexItem->pszGloss = duplicateString(pszGloss); pLexItem->uiCategory = index_lexical_category(pszCategory); pLexicon_g = addDataToTrie(pLexicon_g, pszForm, pLexItem, add_lex_item, 3); ++uiLexiconCount_g; } fclose(pLexiconFP); }
`trie.c'
#include "textctl.h" /* or template.h or opaclib.h */ void addLowerUpperWFChars(char * pszLUPairs_in, TextControl * pTextCtl_io);
addLowerUpperWFChars
scans the input string for character pairs.
The first member of each pair is added to the set of (multibyte)
lowercase alphabetic characters, and the second member is added to the
set of (multibyte) uppercase alphabetic characters. Note that there may
be a many-to-many mapping between lowercase and uppercase characters.
The arguments to addLowerUpperWFChars
are as follows:
pszLUPairs_in
pTextCtl_io
none
5.2.4 Example
#include "textctl.h" ... TextControl sTextInputCtl_m; ... void set_alphabetic(pszField_in) char * pszField_in; { int code; char * psz; psz = pszField_in; code = *psz++; switch (code) { case 'A': /* alphabetic (word formation) characters */ addWordFormationChars(psz, &sTextInputCtl_m); break; case 'L': /* lower-upper word formation characters */ addLowerUpperWFChars(psz, &sTextInputCtl_m); break; case 'a': /* multibyte alphabetic (word formation) characters */ addWordFormationCharStrings(psz, &sTextInputCtl_m); break; case 'l': /* multibyte lower-upper word formation characters */ addLowerUpperWFCharStrings(psz, &sTextInputCtl_m); break; default: break; } } void reset_alphabetic() { resetWordFormationChars(&sTextInputCtl_m); }
`myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */ void addLowerUpperWFCharStrings(char * pszLUPairs_in, TextControl * pTextCtl_io);
addLowerUpperWFCharStrings
scans the input string for pairs of
multibyte characters. The first member of each pair is added to the set
of multibyte lowercase alphabetic characters, and the second member is
added to the set of multibyte uppercase alphabetic characters. Note that
there may be a many-to-many mapping between lowercase and uppercase
multibyte characters.
The arguments to addLowerUpperWFChars
are as follows:
pszLUPairs_in
pTextCtl_io
none
5.3.4 Example See section 5.2 addLowerUpperWFChars.
5.3.5 Source File `myctype.c'
#include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ StringClass * addStringClass(char * pszField_in, StringClass * pClasses_io);
addStringClass
adds a string class to the list of string
classes. String classes are used in string environments such as those
in the consistent change notation supported by the OPAC function
library.
The arguments to addStringClass
are as follows:
pszField_in
pClasses_io
NULL
the
first time addStringClass
is called. Each subsequent call
should use the value returned by the preceding call.
a pointer to the head of the updated list of string classes
5.4.4 Example
#include "change.h" /* includes strclass.h */ ... static Change * pChanges_m = NULL; static StringClass * pClasses_m = NULL; ... void store_change_info(pszField_in) char * pszField_in; { Change * pChg; char * psz; int code; if (pszField_in == NULL) return; psz = pszField_in; code = *psz++; /* grab the table code */ switch (code) { case 'C': /* change */ pChg = parseChangeString( psz, pClasses_m ); if (pChg != (Change *)NULL) { pChg->pNext = pChanges_m; pChanges_m = pChg; } break; case 'S': /* string class */ pClasses_m = addStringClass( psz, pClasses_m ); break; default: break; } }
`strcla.c'
#include "strlist.h" StringList * addToStringList(StringList * pList_in, const char * pszString_in);
addToStringList
adds a string to the beginning of a list of
strings. It does not check whether the string is already in the list.
The arguments to addToStringList
are as follows:
pList_in
NULL
to signal an empty list.
pszString_in
NUL
-terminated character string.
a pointer to the revised list
5.5.4 Example
#include "strlist.h" ... StringList * pStrings = NULL; ... /* pStrings-->NULL */ pStrings = addToStringList(pStrings, "this"); /* pStrings-->"this"-->NULL */ pStrings = addToStringList(pStrings, "test"); /* pStrings-->"test"-->"this"-->NULL */ pStrings = addToStringList(pStrings, "is"); /* pStrings-->"is"-->"test"-->"this"-->NULL */ pStrings = addToStringList(pStrings, "a"); /* pStrings-->"a"-->"is"-->"test"-->"this"-->NULL */ pStrings = addToStringList(pStrings, "test"); /* pStrings-->"test"-->"a"-->"is"-->"test"-->"this"-->NULL */
`add_sl.c'
#include "textctl.h" /* or template.h or opaclib.h */ void addWordFormationChars(char * pszLetters_in, TextControl * pTextCtl_io);
addWordFormationChars
scans the input string for non-whitespace
characters. Each such character is added to the set of alphabetic
characters that do not have a lowercase/UPPERCASE distinction. (An
English example would be the apostrophe character.)
The arguments to addWordFormationChars
are as follows:
pszLetters_in
pTextCtl_io
none
5.6.4 Example See section 5.2 addLowerUpperWFChars.
5.6.5 Source File `myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */ void addWordFormationCharStrings(char * pszLetters_in, TextControl * pTextCtl_io);
addWordFormationCharStrings
scans the input string for multibyte
characters. Each such multibyte character sequence is added to the set
of multibyte caseless alphabetic characters.
The arguments to addWordFormationCharStrings
are as follows:
pszLetters_in
pTextCtl_io
none
5.7.4 Example See section 5.2 addLowerUpperWFChars.
5.7.5 Source File `myctype.c'
#include "allocmem.h" /* or opaclib.h */ void * allocMemory(size_t uiSize_in);
allocMemory
provides a "safe" interface to malloc
. If
the requested memory cannot be allocated, the function pointed to by
pfOutOfMemory_g
is called. If pfOutOfMemory_g
is
NULL
, then the default behavior is to display an error message
incorporating the string stored in szOutOfMemoryMarker_g
and
abort the program.
It is assumed that allocMemory
always returns a good value.
This implies that any function pointed to by pfOutOfMemory_g
either aborts the program or uses longjmp
to escape to a safe
place in the program.
allocMemory
has a single argument:
uiSize_in
a pointer to the beginning of the memory area allocated
5.8.4 Example
#include "allocmem.h" ... char * p; ... p = allocMemory(75);
`allocmem.c'
#include "change.h" /* or textctl.h or template.h or opaclib.h */ char * applyChanges(const char * pszString_in, const Change * pChangeList_in);
applyChanges
applies a list of consistent changes to a string.
The function steps through the list of changes, applying each change as
often as necessary before trying the next change in the list. The
input string is not changed; rather, a copy is created, modified, and
returned.
The arguments to applyChanges
are as follows:
pszString_in
pChangeList_in
a pointer to a dynamically allocated and (possibly) changed string
5.9.4 Example
#include "change.h" ... Change * pChanges_m; ... char * pszChanged; ... pszChanged = applyChanges("this is a test", pChanges_m); ... freeMemory( pszChanged );
`change.c'
#include "opaclib.h" char * buildAdjustedFilename(const char * pszFilename_in, const char * pszBasePathname_in, const char * pszExtension_in);
buildAdjustedFilename
builds a filename from the pieces given.
If the base pathname contains directory information, and the input
filename is not an absolute pathname, the leading directory information
is added to the output filename. If the extension is given, and the
input filename does not have an extension, the extension is added to
the output filename if the file cannot be opened for input without it.
The arguments to buildAdjustedFilename
are as follows:
pszFilename_in
pszBasePathname_in
NULL
.
pszExtension_in
NULL
.
a pointer to a dynamically allocated filename string
5.10.4 Example
#include "opaclib.h" ... int readControlFile(char * pszControlFile_in) { char * pszIncludeFile; char szBuffer[512]; FILE * pControlFP; char * p; pControlFP = fopen(pszControlFile_in, "r"); if (pControlFP == NULL) return 0; while (fgets(szBuffer, 512, pControlFP) != NULL) { p = szBuffer + strlen(szBuffer) - 1; if (*p == '\n') *p = '\0'; if (strncmp(szBuffer, "\\include", 8) == 0) { pszIncludeFile = szBuffer + 8; pszIncludeFile += strspn(pszIncludeFile, " \t\r\n\f"); if (*pszIncludeFile == '\0') continue; pszIncludeFile = buildAdjustedFilename(pszIncludeFile, pszControlFile_in, ".ctl"); readControlFile(pszIncludeFile); freeMemory(pszIncludeFile); } ... } fclose(pControlFP); return 1; }
`adjfname.c'
#include "change.h" /* or textctl.h or template.h or opaclib.h */ char * buildChangeString(const Change * pChange_in);
buildChangeString
builds a textual representation of the given
consistent change data structure.
buildChangeString
has one argument:
pChange_in
pNext
field of the Change
data structure is ignored.)
a pointer to a dynamically allocated string representing the change, or
NULL
if an error occurs
5.11.4 Example
#include "change.h" ... void displayChangeList(Change * pChanges_in) { Change * pChange; char * pszChange; for ( pChange = pChanges_in ; pChange ; pChange = pChange->pNext ) { pszChange = buildChangeString( pChange ); fprintf(stderr, "%s\n", pszChange); freeMemory( pszChange ); } }
`change.c'
#include <stdio.h> #include "opaclib.h" void checkFileError(FILE * pOutputFP_in, const char * pszProcessName_in, const char * pszFilename_in);
checkFileError
checks for an error in the output file
pOutputFP_in
whose name is given by pszFilename_in
. If
an error occurred, the output file is deleted and the program exits
with an error message.
The arguments to checkFileError
are as follows:
pOutputFP_in
pszProcessName_in
pszFilename_in
none
5.12.4 Example
#include <stdio.h> #include "cportlib.h" ... FILE * fp; char filename[100]; ... checkFileError(fp, "Program Name", filename); fclose(fp);
`fulldisk.c'
#include "record.h" /* or opaclib.h */ void cleanupAfterStdFormatRecord(void);
cleanupAfterStdFormatRecord
frees any memory allocated for
readStdFormatRecord
.
cleanupAfterStdFormatRecord
does not have any arguments.
5.13.3 Return Value none
5.13.4 Example
#include <stdio.h> #include "record.h" static CodeTable sLexTable_m = { "\\w\0W\0\\c\0C\\f\0F\\g\0G\0", 4, "\\w" }; ... int load_lexicon(pszLexiconFile_in, cComment_in) char * pszLexiconFile_in; int cComment_in; { FILE * fp; unsigned uiRecordCount = 0; char * pRecord; /* * open the lexicon file */ if (pszLexiconFile_in == NULL) return( 0 ); fp = fopen(pszLexiconFile_in, "r"); if (fp == (FILE *)NULL) return( 0 ); /* * load all the records from the lexicon file */ uiRecordCount = 0; while ((pRecord = readStdFormatRecord(fp, &sLexTable_m, cComment_in, &uiRecordCount)) != NULL) { ... } /* * close the lexicon file and erase the temporary data structures */ fclose(fp); cleanupAfterStdFormatRecord(); return( 1 ); }
`record.c'
#include "textctl.h" /* or template.h or opaclib.h */ const unsigned char * convLowerToUpper(const unsigned char * pszString_in, const TextControl * pTextCtl_in);
convLowerToUpper
checks whether the input string begins with a
multibyte lowercase character. If so, it returns the (first)
corresponding multibyte uppercase character.
This function depends on previous calls to addLowerUpperWFChars
or
addLowerUpperWFCharStrings
to establish the mappings between
lowercase and uppercase multibyte characters.
(addLowerUpperWFChars
and addLowerUpperWFCharStrings
are
implicitly called by loadIntxCtlFile
and loadOutxCtlFile
.)
The arguments to convLowerToUpper
are as follows:
pszString_in
NUL
-terminated character string.
pTextCtl_in
a pointer to a NUL
-terminated string containing the (primary)
corresponding multibyte uppercase character, or NULL
if the input
string does not begin with a multibyte lowercase character. This may
point to a static buffer that may be overwritten by the next call to
convLowerToUpper
.
5.14.4 Example
#include "textctl.h" ... static TextControl sTextCtl_m; static StringClass * pStringClasses_m; static char szOutxFilename_m[100]; ... loadOutxCtlFile(szOutxFilename_m, ';', &sTextCtl_m, &pStringClasses_m); ... unsigned char * upcaseString(unsigned char * pszString_in) { size_t iCharSize; size_t iUCSize; size_t iUpperLength; unsigned char * p; unsigned char * pUC; unsigned char * pszUpper; unsigned char * q; if (pszString_in == NULL) return NULL; for ( p = pszString_in ; *p ; p += iCharSize ) { if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0) iCharSize = 1; pUC = convLowerToUpper(p, &sTextCtl_m); if (pUC != NULL) iUpperLength += strlen((char *)pUC); else iUpperLength += iCharSize; } pszUpper = allocMemory(iUpperLength + 1); for ( p = pszString_in, q = pszUpper ; *p ; p += iCharSize ) { if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0) iCharSize = 1; pUC = convLowerToUpper(p, &sTextCtl_m); if (pUC != NULL) { iUCSize = strlen((char *)pUC); memcpy(q, pUC, iUCSize); q += iUCSize; } else { memcpy(q, p, iCharSize); q += iCharSize; } } pszUpper[iUpperLength] = NUL; return pszUpper; }
`myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */ const StringList * convLowerToUpperSet(const unsigned char * pszString_in, const TextControl * pTextCtl_in);
convLowerToUpperSet
checks whether the input string begins with a
multibyte lowercase character. If so, it returns the complete set of
corresponding multibyte uppercase characters.
This function depends on previous calls to addLowerUpperWFChars
or
addLowerUpperWFCharStrings
to establish the mappings between
lowercase and uppercase multibyte characters.
(addLowerUpperWFChars
and addLowerUpperWFCharStrings
are
implicitly called by loadIntxCtlFile
and loadOutxCtlFile
.)
The arguments to convLowerToUpperSet
are as follows:
pszString_in
NUL
-terminated character string.
pTextCtl_in
a pointer to a list of NUL
-terminated strings containing the
corresponding multibyte uppercase characters, or NULL
if the input
string does not begin with a multibyte lowercase character. This may
point to a static buffer that may be overwritten by the next call to
convLowerToUpperSet
.
5.15.4 Example #include "textctl.h" #include "rpterror.h" ... StringList * upcaseWord(pszWord_in, pTextCtl_in) char * pszWord_in; const TextControl * pTextCtl_in; { size_t uiCharCount; size_t uiLowerCount; size_t uiNumberAlternatives; size_t uiSpan; size_t uiWordLength; size_t k; int iLength; unsigned char * p; StringList * pUpcaseList = NULL; const StringList * pUpperSet; const StringList * ps; /* * count the number of multibyte characters in the string * count the lowercase letters * calculate the number of (ambiguous) upcase conversions * calculate the maximum length of the upcased word */ uiCharCount = 0; uiLowerCount = 0; uiNumberAlternatives = 1; uiWordLength = 1; /* count the terminating NUL byte */ for ( p = (unsigned char *)pszWord_in ; *p != NUL ; p += iLength ) { iLength = matchAlphaChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; ++uiCharCount; if (matchLowercaseChar(p, pTextCtl_in) != 0) { ++uiLowerCount; pUpperSet = convLowerToUpperSet(p, pTextCtl_in); uiNumberAlternatives *= getStringListSize( pUpperSet ); uiSpan = 0; for ( ps = pUpperSet ; ps ; ps = ps->pNext ) { k = strlen( ps->pszString ); if (k > uiSpan) uiSpan = k; } } else uiSpan = iLength; uiWordLength += uiSpan; } if (uiLowerCount == 0) { /* * the word is already all uppercase */ return addToStringList(NULL, pszWord_in); } else { /* * convert word to all uppercase (possibly ambiguosly) */ char * pszCapWord; char * pszUpper; size_t uiNum; int iUpperLength; size_t i; size_t j;
if (uiNumberAlternatives < 1) { reportError(ERROR_MSG, "error getting uppercase equivalents for \"%s\"\n", pszWord_in); return NULL; } if (uiNumberAlternatives > 500) { reportError(WARNING_MSG, "%lu uppercase equivalents is too many: storing only 500\n", uiNumberAlternatives); uiNumberAlternatives = 500; } pszCapWord = allocMemory(uiWordLength); for ( i = 0 ; i < uiNumberAlternatives ; ++i ) { strcpy(pszCapWord, pszWord_in); uiSpan = 1; j = 0; for ( p = (unsigned char *)pszCapWord ; *p ; p += iLength ) { iLength = matchLowercaseChar(p, pTextCtl_in); if (iLength != 0) { pUpperSet = convLowerToUpperSet(p, pTextCtl_in); uiNum = getStringListSize(pUpperSet); pszUpper = pUpperSet->pszString; if (uiNum > 1) { k = (i / uiSpan) % uiNum; uiSpan *= uiNum; for ( ps = pUpperSet ; ps ; ps = ps->pNext ) { if (k == 0) { pszUpper = ps->pszString; break; } --k; } } /* * replace the lowercase multibyte character with an * equivalent uppercase multibyte character */ iUpperLength = strlen(pszUpper); if (iUpperLength != iLength) memmove(p + iUpperLength, p + iLength, strlen((char *)p + iLength) + 1); memcpy(p, pszUpper, iUpperLength); iLength = iUpperLength; } else { iLength = matchAlphaChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; } ++j; } pUpcaseList = addToStringList(pUpcaseList, pszCapWord); } freeMemory( pszCapWord ); } return pUpcaseList; }
5.15.5 Source File `myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */ const unsigned char * convUpperToLower(const unsigned char * pszString_in, const TextControl * pTextCtl_in);
convUpperToLower
checks whether the input string begins with a
multibyte uppercase character. If so, it returns the (first)
corresponding multibyte lowercase character.
This function depends on previous calls to addLowerUpperWFChars
or
addLowerUpperWFCharStrings
to establish the mappings between
lowercase and uppercase multibyte characters.
(addLowerUpperWFChars
and addLowerUpperWFCharStrings
are
implicitly called by loadIntxCtlFile
and loadOutxCtlFile
.)
The arguments to convUpperToLower
are as follows:
pszString_in
NUL
-terminated character string.
pTextCtl_in
a pointer to a NUL
-terminated string containing the (primary)
corresponding multibyte lowercase character, or NULL
if the input
string does not begin with a multibyte uppercase character. This may
point to a static buffer that may be overwritten by the next call to
convUpperToLower
.
5.16.4 Example
#include "textctl.h" ... static TextControl sTextCtl_m; static StringClass * pStringClasses_m; static char szIntxFilename_m[100]; ... loadIntxCtlFile(szIntxFilename_m, ';', &sTextCtl_m, &pStringClasses_m); ... unsigned char * downcaseString(unsigned char * pszString_in) { size_t iCharSize; size_t iLCSize; size_t iLowerLength; unsigned char * p; unsigned char * pLC; unsigned char * pszLower; unsigned char * q; if (pszString_in == NULL) return NULL; for ( p = pszString_in ; *p ; p += iCharSize ) { if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0) iCharSize = 1; pLC = convUpperToLower(p, &sTextCtl_m); if (pLC != NULL) iLowerLength += strlen((char *)pLC); else iLowerLength += iCharSize; } pszLower = allocMemory(iLowerLength + 1); for ( p = pszString_in, q = pszLower ; *p ; p += iCharSize ) { if ((iCharSize = matchAlphaChar(p, &sTextCtl_m)) == 0) iCharSize = 1; pLC = convUpperToLower(p, &sTextCtl_m); if (pLC != NULL) { iLCSize = strlen((char *)pLC); memcpy(q, pLC, iLCSize); q += iLCSize; } else { memcpy(q, p, iCharSize); q += iCharSize; } } pszLower[iLowerLength] = NUL; return pszLower; }
`myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */ const StringList * convUpperToLowerSet(const unsigned char * pszString_in, const TextControl * pTextCtl_in);
convUpperToLowerSet
checks whether the input string begins with a
multibyte uppercase character. If so, it returns the complete set of
corresponding multibyte lowercase characters.
This function depends on previous calls to addLowerUpperWFChars
or
addLowerUpperWFCharStrings
to establish the mappings between
lowercase and uppercase multibyte characters.
(addLowerUpperWFChars
and addLowerUpperWFCharStrings
are
implicitly called by loadIntxCtlFile
and loadOutxCtlFile
.)
The arguments to convUpperToLowerSet
are as follows:
pszString_in
NUL
-terminated character string.
pTextCtl_in
a pointer to a list of NUL
-terminated strings containing the
corresponding multibyte lowercase characters, or NULL
if the input
string does not begin with a multibyte uppercase character. This may
point to a static buffer that may be overwritten by the next call to
convUpperToLowerSet
.
5.17.4 Example
#include "textctl.h" #include "rpterror.h" ... StringList * downcaseWord(pszWord_in, pTextCtl_in) char * pszWord_in; const TextControl * pTextCtl_in; { size_t uiCharCount; size_t uiUpperCount; size_t uiNumberAlternatives; size_t uiSpan; size_t uiWordLength; size_t k; int iLength; unsigned char * p; StringList * pDowncaseList = NULL; const StringList * pLowerSet; const StringList * ps; /* * count the number of multibyte characters in the string * count the uppercase letters * calculate the number of (ambiguous) downcase conversions * calculate the maximum length of the downcased word */ uiCharCount = 0; uiUpperCount = 0; uiNumberAlternatives = 1; uiWordLength = 1; /* count the terminating NUL byte */ for ( p = (unsigned char *)pszWord_in ; *p != NUL ; p += iLength ) { iLength = matchAlphaChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; ++uiCharCount; if (matchUppercaseChar(p, pTextCtl_in) != 0) { ++uiUpperCount; pLowerSet = convUpperToLowerSet(p, pTextCtl_in); uiNumberAlternatives *= getStringListSize( pLowerSet ); uiSpan = 0; for ( ps = pLowerSet ; ps ; ps = ps->pNext ) { k = strlen( ps->pszString ); if (k > uiSpan) uiSpan = k; } } else uiSpan = iLength; uiWordLength += uiSpan; } if (uiUpperCount == 0) { /* * the word is already all lowercase */ return addToStringList(NULL, pszWord_in); } else { /* * convert word to all lowercase (possibly ambiguosly) */ char * pszDecapWord; char * pszLower; size_t uiNum; int iLowerLength; size_t i; size_t j; if (uiNumberAlternatives < 1) { reportError(ERROR_MSG, "error getting lowercase equivalents for \"%s\"\n", pszWord_in); return NULL; } if (uiNumberAlternatives > 500) { reportError(WARNING_MSG, "%lu lowercase equivalents is too many: storing only 500\n", uiNumberAlternatives); uiNumberAlternatives = 500; } pszDecapWord = allocMemory(uiWordLength); for ( i = 0 ; i < uiNumberAlternatives ; ++i ) { strcpy(pszDecapWord, pszWord_in); uiSpan = 1; j = 0; for ( p = (unsigned char *)pszDecapWord ; *p ; p += iLength ) { iLength = matchUppercaseChar(p, pTextCtl_in); if (iLength != 0) { pLowerSet = convUpperToLowerSet(p, pTextCtl_in); uiNum = getStringListSize(pLowerSet); pszLower = pLowerSet->pszString; if (uiNum > 1) { k = (i / uiSpan) % uiNum; uiSpan *= uiNum; for ( ps = pLowerSet ; ps ; ps = ps->pNext ) { if (k == 0) { pszLower = ps->pszString; break; } --k; } } /* * replace the uppercase multibyte character with an * equivalent lowercase multibyte character */ iLowerLength = strlen(pszLower); if (iLowerLength != iLength) memmove(p + iLowerLength, p + iLength, strlen((char *)p + iLength) + 1); memcpy(p, pszLower, iLowerLength); iLength = iLowerLength; } else { iLength = matchAlphaChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; } ++j; } pDowncaseList = addToStringList(pDowncaseList, pszDecapWord); } freeMemory( pszDecapWord ); } return pDowncaseList; }
`myctype.c'
#include "template.h" /* or opaclib.h */ int decapitalizeWord(WordTemplate * pWord_io, const TextControl * pTextCtl_in);
int (pWord_io, pTextCtl_in) WordTemplate * pWord_io; /* pointer to WordTemplate structure TextControl * pTextCtl_in;
decapitalizeWord
converts the input word to all lowercase (possibly
ambiguously) and returns a capitalization flag:
0 (NOCAP)
1 (INITCAP)
2 (ALLCAP)
>4
After the conversion to all lowercase, any orthography changes stored
in pTextCtl_in
are applied.
The arguments to decapitalizeWord
are as follows:
pWord_io
pTextCtl_in
the capitalization flag for the word
5.18.4 Example
#include "template.h" /* includes textctl.h */ ... WordTemplate * buildTemplate( char * pszWord_in, TextControl * pTextCtl_in) { WordTemplate * pTemplate; if (pszWord_in == NULL) return NULL; pTemplate = (WordTemplate *)allocMemory(sizeof(WordTemplate)); pTemplate->pszOrigWord = duplicateString( pszWord_in ); pTemplate->iCapital = decapitalizeWord( pTemplate, pTextCtl_in); return pTemplate; }
`textin.c'
#include "rpterror.h" /* or opaclib.h */ void displayNumberedMessage(const NumberedMessage * pMessage_in, int bSilent_in, int bShowWarnings_in, FILE * pLogFP_in, const char * pszFilename_in, unsigned uiLineNumber_in, ...);
displayNumberedMessage
writes a numbered error or warning
message to the standard error output (screen), optionally writing it to
a log file as well. For GUI programs, the programmer must write a
different version of displayNumberedMessage
to satisfy the link
requirements of other functions in the OPAC library. This would
typically display a message box or write to a message window.
The arguments to displayNumberedMessage
are as follows:
pMessage_in
NumberedMessage
data structure that contains the
message type, the message number, and the format string for the
message.
bSilent_in
TRUE
(nonzero).
bShowWarnings_in
TRUE
(nonzero).
pLogFP_in
FILE
pointer to an open log file, or is NULL
.
pszFilename_in
NULL
.
uiLineNumber_in
0
).
...
printf
style format string given by pMessage_in
.
none
5.19.4 Example
#include <stdio.h> #include "opaclib.h" /* includes rpterror.h */ ... int bSilent_g = 0; int bShowWarnings_g = 1; FILE * pLogFP_g = NULL; ... static NumberedMessage sCannotOpen_m = { ERROR_MSG, 100, "Cannot open %s file %s" }; static NumberedMessage sIgnoreRedundant_m = { WARNING_MSG, 101, "Ignoring all but first \\%s line" }; static char * aszCodes_m[] = { "\\lexicon", "\\grammar", ... NULL }; ... FILE * pControlFP; char * pszControlFile; unsigned uiLineNumber; char * pszLexFile; char ** ppszField; char * p; unsigned i; ... pControlFP = fopen(pszControlFile, "r"); if (pControlFP == (FILE *)NULL) { displayNumberedMessage(&sCannotOpen_m, bSilent_g, bShowWarnings_g, pLogFP_g, NULL, 0, "log", pszControlFile); exit(1); } uiLineNumber = 1; while ((ppszField = readStdFormatField(pControlFP, aszCodes_m, NUL)) != NULL) { switch (**ppszField) { case 1: /* "\\lexicon" */ if (pszLexFile != (char *)NULL) displayNumberedMessage(&sIgnoreRedundant_m, bSilent_g, bShowWarnings_g, pLogFP_g, pszControlFile, uiLineNumber, "lexicon"); else { p = strtok(ppszField[0]+1, " \t\r\n\f\v"); pszLexFile = buildAdjustedFilename(p, pszControlFile, ".lex"); } break; ... } ... for ( i = 0 ; ppszField[i] ; ++i ) ++uiLineNumber; } ...
`textin.c'
#include "allocmem.h" /* or opaclib.h */ char * duplicateString(const char * pszString_in);
duplicateString
creates a copy of an existing NUL
-terminated
character string. It calls allocateMemory
to get the memory to
store the copy of the string. If pszString_in
is NULL
,
then duplicateString
returns NULL
.
This is the same as the standard function strdup
, except that it
calls allocateMemory
instead of malloc
.
duplicateString
has one argument:
pszString_in
NUL
-terminated character string.
a pointer to the newly allocated and copied duplicate string
5.20.4 Example
#include "template.h" /* includes textctl.h */ ... WordTemplate * buildTemplate( char * pszWord_in, TextControl * pTextCtl_in) { WordTemplate * pTemplate; if (pszWord_in == NULL) return NULL; pTemplate = (WordTemplate *)allocMemory(sizeof(WordTemplate)); pTemplate->pszOrigWord = duplicateString( pszWord_in ); pTemplate->iCapital = decapitalizeWord( pTemplate, pTextCtl_in); return pTemplate; }
`allocmem.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * duplicateStringList(const StringList * pList_in);
duplicateStringList
copies a list of strings to create another,
identical list of strings. If pList_in
is NULL
, then
duplicateStringList
returns NULL
.
duplicateStringList
has one argument:
pList_io
a pointer to the new list of dynamically allocated strings
5.21.4 Example
#include "strlist.h" ... StringList * pList1; StringList * pList2; ... pList2 = duplicateStringList(pList1); ... freeStringList( pList2 ); pList2 = NULL;
`copy_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ int equivalentStringLists(const StringList * pFirst_in, const StringList * pSecond_in);
equivalentStringLists
tests whether or not two string lists
contain the same strings. The strings do not have to be in the same
order in the two lists. Duplicate strings in either list are
immaterial.
The arguments to equivalentStringLists
are as follows:
pFirst_in
pSecond_in
nonzero (TRUE) if the lists are equal, otherwise zero (FALSE)
5.22.4 Example
#include "strlist.h" ... StringList * pList1; StringList * pList2; ... if (equivalentStringLists(pList1, pList2)) { ... }
`equiv_sl.c'
#include "opaclib.h" char * eraseCharsInString(char * pszString_io, const char * pszEraseChars_in);
eraseCharsInString
erases any characters from
pszEraseChars_in
that are found in pszString_io
, possibly
shortening pszString_io
as a side-effect.
The arguments to eraseCharsInString
are as follows:
pszString_io
pszEraseChars_in
a pointer to the possibly modified string
5.23.4 Example
#include "opaclib.h" /* includes allocmem.h */ ... static char szMarkers_m[] = "-=#"; ... static int get_score(pszMarkedWord_in) const char * pszMarkedWord_in; { char * pszWord; int iScore = 0; if (pszMarkedWord_in != NULL) { pszWord = eraseCharsInString(duplicateString(pszMarkedWord_in), szMarkers_m); ... freeMemory(pszWord); } return iScore; }
`erasecha.c'
#include "trie.h" /* or opaclib.h */ void eraseTrie(Trie * pTrieHead_io, void (* pfEraseInfo_in)(void * pList_io));
eraseTrie
walks through a trie, freeing all the memory allocated
for the trie and for the information it stores.
The arguments to eraseTrie
are as follows:
pTrieHead_io
pfEraseInfo_in
pList_io
none
5.24.4 Example
#include "trie.h" #include "allocmem.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next element */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; unsigned long uiLexiconCount_g; ... static void erase_lex_item(void * pList) { LexItem * pItem; LexItem * pNextItem; for ( pItem = (LexItem *)pList ; pItem ; pItem = pNextItem ) { pNextItem = pItem->pLink; if (pItem->pszForm != NULL) freeMemory(pItem->pszForm); if (pItem->pszGloss != NULL) freeMemory(pItem->pszGloss); freeMemory(pItem); } } void free_lexicon() { if (pLexicon_g != NULL) { eraseTrie(pLexicon_g, erase_lex_item); pLexicon_g = NULL; } uiLexiconCount_g = 0L; }
`trie.c'
#include "opaclib.h" int exitSafely(int iCode_in);
exitSafely
replaces exit
. When compiled for Microsoft
Windows, the program should define exitSafely
to not call
exit
because Windows doesn't like that very much!
exitSafely
has one argument:
iCode_in
none, but it must be defined as returning int to keep everyone happy
5.25.4 Example
#include <stdlib.h> #include "opaclib.h" ... char * pszCopy; ... pszCopy = strdup("This is a test!"); if (pszCopy == NULL) { ... exitSafely(2); }
`safeexit.c'
#include "opaclib.h" void fcloseWithErrorCheck(FILE * pOutputFP_in, const char * pszFilename_in);
fcloseWithErrorCheck
checks for the output file for write
errors, and closes it. If an error is detected, it is reported using
reportError
.
The arguments to fcloseWithErrorCheck
are as follows:
pOutputFP_in
pszFilename_in
none
5.26.4 Example
#include <stdio.h> #include "opaclib.h" ... FILE * pOutput; char * pszFilename; ... pOutput = fopen(pszFilename, "w"); if (pOutput != NULL) { ... fcloseWithErrorCheck(pOutput, pszFilename); pOutput = NULL; }
`errcheck.c'
#include "trie.h" /* or opaclib.h */ void * findDataInTrie(const Trie * pTrieHead_in, const char * pszKey_in);
findDataInTrie
searches the trie for information stored using
the key for access. The pointer returned is not guaranteed to point to
only desired information unless the length of the key is less than the
maximum depth of the trie. You may need to scan over the list (or
array) to get exactly what you want.
The arguments to findDataInTrie
are as follows:
pTrieHead_in
pszKey_in
a pointer to the generic information found in the trie, or NULL
if
the search fails
5.27.4 Example
#include "trie.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next element */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; ... LexItem * find_entries(unsigned char * pszWord_in) { LexItem * pLex; for ( pLex = findDataInTrie(pLexicon_g, pszWord_in) ; pLex ; pLex = pLex->pLink ) { if (strcmp(pLex->pszForm, pszWord_in) == 0) { /* * since add_lex_item() links the homographs together, * this points to a list containing only the homographs */ return pLex; } } return NULL; }
`trie.c'
#include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ StringClass * findStringClass(const char * pszName_in, const StringClass * pClasses_in);
findStringClass
searches a list of string classes for a specific
string class by name.
The arguments to findStringClass
are as follows:
pszName_in
pClasses_in
a pointer to the string class found, or NULL
if not found
5.28.4 Example
#include "strclass.h" #include "rpterror.h" ... static StringClass * pClasses_m = NULL; ... StringClass * pClass; char * pszClassName; ... pClass = findStringClass( pszClassName, pClasses_m); if (pClass == NULL) reportError(WARNING_MSG, "Undefined class %s\n", pszName); ...
`strcla.c'
#include "allocmem.h" /* or opaclib.h */ char * fitAllocStringExactly(char * pszString_in);
fitAllocStringExactly
shrinks the allocated buffer to exactly
fit the string. The program is aborted with an error message if it
somehow runs out of memory.
(See section 5.8 allocMemory,
for details about this error message.)
fitAllocStringExactly
has one argument:
pszString_in
a pointer to the (possibly) reallocated block
5.29.4 Example
#include <stdio.h> #include "allocmem.h" ... char * read_line(FILE * pInputFP_in) { char * pszBuffer; size_t uiBufferSize = 500; size_t uiLineLength; if ((pInputFP_in == NULL) || feof(pInputFP_in)) return NULL; pszBuffer = allocMemory(uiBufferSize); if (fgets(pszBuffer, uiBufferSize, pInputFP_in) == NULL) { freeMemory(pszBuffer); return NULL; } while (strchr(pszBuffer, '\n') == NULL) { uiBufferSize += 500; pszBuffer = reallocMemory(pszBuffer, uiBufferSize); uiLineLength = strlen(pszBuffer); if (fgets(pszBuffer + uiLineLength, uiBufferSize - uiLineLength, pInputFP_in) == NULL) break; } return fitAllocStringExactly( pszBuffer ); }
`allocmem.c'
#include "opaclib.h" void fixSynthesizedWord(WordTemplate * pTemplate_io, const TextControl * pTextCtl_in);
fixSynthesizedWord
applies the output orthography changes and
recapitalization to the list of synthesized wordforms. The list is
updated to reflect these changes, and to minimize any ensuing
ambiguity.
The arguments to fixSynthesizedWord
are as follows:
pTemplate_io
pTextCtl_in
none
5.30.4 Example
#include "template.h" ... TextControl sTextControl_g; ... FILE * pInputFP; FILE * pOutputFP; WordTemplate * pWord; ... for (;;) { pWord = readTemplateFromAnalysis(pInputFP, &sTextControl_g); if (pWord == NULL) break; pWord->pNewWords = synthesize_word(pWord->pAnalyses, &sTextControl_g); fixSynthesizedWord(pWord, &TextControl_g); writeTextFromTemplate( pOutputFP, pWord, &sTextControl_g); freeWordTemplate( pWord ); }
`textout.c'
#include "opaclib.h" FILE * fopenAlways(char * pszFilename_io, const char * pszMode_in);
fopenAlways
opens a file, prompting the user if necessary and
retrying until successful. If it is not NULL
,
pszFilename_io
is updated to contain the name of the file
actually opened. fopenAlways
uses fopen
to open the
file, and repeatedly prompts the user for a filename if fopen
fails.
The buffer pointed to by pszFilename_io must be (at least)
FILENAME_MAX
bytes long. If FILENAME_MAX
is not defined
by `stdio.h', then it is assumed to be 128.
pszFilename_io
NULL
.
pszMode_in
fopen
mode string (usually "r"
or
"w"
).
a valid FILE pointer
5.31.4 Example
#include <stdio.h> #include "opaclib.h" ... FILE * pInputFP; char szFilename[FILENAME_MAX]; ... pInputFP = fopenAlways(szFilename, "r"); ... fclose(pInputFP); pInputFP = NULL;
`ufopen.c'
#include "change.h" /* or textctl.h or template.h or opaclib.h */ void freeChangeList(Change * pList_io);
freeChangeList
frees the memory allocated for a list of
consistent change structures.
freeChangeList
has one argument:
pList_io
none
5.32.4 Example
#include "change.h" ... Change * pChangeList_g; ... void add_change(char * pszChange_in) { Change * pTail; if (pChangeList_g == NULL) pChangeList_g = parseChangeString( pszChange_in ); else { for (pTail = pChangeList_g ; pTail->pNext ; pTail = pTail->pNext) ; pTail->pNext = parseChangeString( pszChange_in ); } } ... freeChangeList( pChangeList_g ); pChangeList_g = NULL;
`change.c'
5.33.1 Syntax
#include "record.h" void freeCodeTable(CodeTable * pCodeTable_io);
freeCodeTable
frees the memory allocated for a CodeTable
data structure.
freeCodeTable
has only one argument:
pCodeTable_io
CodeTable
data structure that contains information
that is no longer needed.
none
5.33.4 Example
#include "record.h" #include "ample.h" AmpleData sAmpleData_g; char szCodesFilename_g[100]; char szDictFilename_g[100]; ... loadAmpleDictCodeTables(szCodesFilename_g, &sAmpleData_g, FALSE); ... loadAmpleDictionary(szDictFilename_g, PFX, &sAmpleData_g); freeCodeTable( sAmpleData_g.pPrefixTable ); sAmpleData_g.pPrefixTable = NULL;
`free_ct.c'
#include "allocmem.h" /* or opaclib.h */ void freeMemory(void * pBlock_io);
freeMemory
provides a "safe" interface to free
. It
ignores NULL
as an argument. (But passing NULL
is still
poor practice!) This is the only protection added to free
:
passing random memory addresses to freeMemory
, or passing the
same address twice, will result in memory corruption and program
crashes!
freeMemory
has one argument:
pBlock_io
none
5.34.4 Example
#include <stdio.h> #include "allocmem.h" ... char * read_line(FILE * pInputFP_in) { char * pszBuffer; size_t uiBufferSize = 500; size_t uiLineLength; if ((pInputFP_in == NULL) || feof(pInputFP_in)) return NULL; pszBuffer = allocMemory(uiBufferSize); if (fgets(pszBuffer, uiBufferSize, pInputFP_in) == NULL) { freeMemory(pszBuffer); return NULL; } return pszBuffer; }
`allocmem.c'
#include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ void freeStringClasses(StringClass * pClasses_io);
freeStringClasses
frees the memory allocated for the list of
string classes.
freeStringClasses
has one argument:
pClasses_io
none
5.35.4 Example
#include "change.h" /* includes strclass.h */ ... static Change * pChanges_m; static StringClass * pClasses_m; ... void free_change_info() { freeChangeList( pChanges_m ); freeStringClasses( pClasses_m ); pChanges_m = NULL; pClasses_m = NULL; }
`strcla.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ void freeStringList(StringList * pList_io);
freeStringList
deletes a list of strings, freeing all the memory
used by the list of strings.
freeStringList
has one argument:
pList_io
none
5.36.4 Example
#include "strlist.h" ... StringList * pNames_g; ... freeStringList(pNames_g); pNames_g = NULL; ...
`free_sl.c'
#include "template.h" void freeWordAnalysisList(WordAnalysis * pAnalyses_io);
freeWordAnalysisList
frees the memory allocated for a list of
WordAnalysis
data structures.
freeWordAnalysisList
has one argument:
pAnalyses_io
WordAnalysis
data structures.
none
5.37.4 Example
#include "template.h" ... WordTemplate * pWord; ... if (pWord->pAnalyses != NULL) freeWordAnalysisList(pWord->pAnalyses); ...
`wordanal.c'
#include "template.h" /* or opaclib.h */ void freeWordTemplate(WordTemplate * pWord_io);
freeWordTemplate
frees everything in a WordTemplate
data
structure, including the structure itself.
freeWordTemplate
has one argument:
pWord_io
WordTemplate
data structure to free.
none
5.38.4 Example
#include "template.h" ... TextControl sTextCtl_g; ... WordAnalysis * merge_analyses( WordAnalysis * pList_in, WordAnalysis * pAnal_in) { ... } ... void process( FILE * pInputFP_in, FILE * pOutputFP_in) { WordTemplate * pWord; WordAnalysis * pAnal; unsigned uiAmbiguityCount; unsigned long uiWordCount; for ( uiWordCount = 0L ;; ) { pWord = readTemplateFromText(pInputFP_in, &sTextCtl_g); if (pWord == NULL) break; uiAmbiguityCount = 0; if (pWord->paWord != NULL) { for ( i = 0 ; pWord->paWord[i] ; ++i ) { pAnal = analyze(pWord->paWord[i]); pWord->pAnalyses = merge_analyses(pWord->pAnalyses, pAnal); } for (pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext) ++uiAmbiguityCount; } uiWordCount = showAmbiguousProgress(uiAmbiguityCount, uiWordCount); writeTemplate(pOutputFP_in, NULL, pWord, &sTextCtl_g); freeWordTemplate(pWord); } }
`free_wt.c'
#include "allocmem.h" /* or opaclib.h */ unsigned long getAndClearAllocMemorySum(void);
getAndClearAllocMemorySum
returns the amount of memory used by
allocMemory
calls since the last call to
getAndClearAllocMemorySum
. It does not account for calls to
freeMemory
, which greatly reduces its accuracy.
getAndClearAllocMemorySum
does not have any arguments.
5.39.3 Return Value
the number of bytes of memory requested by allocMemory
calls
since the last call to getAndClearAllocMemorySum
5.39.4 Example
#include <stdio.h> #include "allocmem.h" ... getAndClearAllocMemorySum(); /* reset the counter */ ... p = allocMemory(500); ... p = duplicateString("this is a test"); ... printf("%lu bytes allocated recently\n", getAndClearAllocMemorySum());
`allocmem.c'
#include "change.h" /* or textctl.h or template.h or opaclib.h */ int getChangeQuote(const char * pszMatch_in, const char * pszReplace_in);
getChangeQuote
finds a suitable "quote" character that is not
used in either input string.
The arguments to getChangeQuote
are as follows:
pszMatch_in
pszReplace_in
a character suitable for quoting the match and replace strings
5.40.4 Example
#include <string.h> #include "change.h" #include "allocmem.h" char * composeChangeString(pszMatch_in, pszReplace_in, pszEnvir_in) const char * pszMatch_in; const char * pszReplace_in; const char * pszEnvir_in; { char * pszChange; size_t uiLength; char cQuote; if ((pszMatch_in == NULL) && (pszReplace_in == NULL)) return NULL; if (pszMatch_in == NULL) pszMatch_in = ""; if (pszReplace_in == NULL) pszReplace_in = ""; uiEnvirLength = strlen( pszEnvir_in ); uiLength = strlen( pszMatch_in ) + strlen( pszReplace_in ) + 6; if ((pszEnvir_in != NULL) && (*pszEnvir_in != '\0')) uiLength += strlen( pszEnvir_in ) + 1; pszChange = allocMemory(uiLength); cQuote = getChangeQuote(pszMatch_in, pszReplace_in); sprintf(pszChange, "%c%s%c %c%s%c", cQuote, pszMatch_in, cQuote, cQuote, pszReplace_in, cQuote); if ((pszEnvir_in != NULL) && (*pszEnvir_in != '\0')) strcat(strcat(pszChange, " "), pszEnvir_in); return pszChange; }
`change.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ unsigned getStringListSize(const StringList * pList_in);
getStringListSize
counts the number of strings stored in the
list. It does not check for duplicate strings or for NULL
string pointers, just for the total number of data structures linked
together.
getStringListSize
has one argument:
pList_in
the number of strings in the list
5.41.4 Example
#include <stdio.h> #include "strlist.h" ... void writeAmbigWords(pList_in, cAmbig_in, pOutputFP_in) const StringList * pList_in; int cAmbig_in; FILE * pOutputFP_in; { char szAmbig[2]; if (pList_in == NULL) fprintf(pOutputFP_in, "%c0%c%c", cAmbig_in, cAmbig_in, cAmbig_in); else if (pList_in->pNext) { fprintf(pOutputFP_in, "%c%u%c", cAmbig_in, getStringListSize(pList_in), cAmbig_in ); szAmbig[0] = cAmbig_in; szAmbig[1] = '\0'; writeStringList( pList_in, szAmbig, pOutputFP_in ); fprintf(pOutputFP_in, "%c", cAmbig_in); } else fputs(pList_in->pszString, pOutputFP_in); }
`size_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ int identicalStringLists(const StringList * pFirstList_in, const StringList * pSecondList_in);
identicalStringLists
checks whether or not two lists of strings
are identical, that is, whether they have the same strings in the same
order.
The arguments to identicalStringLists
are as follows:
pFirstList_in
pSecondList_in
nonzero (TRUE) if the lists are identical, otherwise zero (FALSE)
5.42.4 Example
#include "strlist.h" ... StringList * pList1; StringList * pList2; ... if (identicalStringLists(pList1, pList2)) { ... }
`equal_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ int isMemberOfStringList(const StringList * pList_in, const char * pszString_in);
isMemberOfStringList
checks whether a string is stored in a list
of strings.
The arguments to isMemberOfStringList
are as follows:
pList_in
pszString_in
nonzero (TRUE) if the string is found in the list, otherwise zero (FALSE)
5.43.4 Example
#include "strlist.h" ... static StringList * pFiles_m = NULL; ... void processFileOnce(const char * pszFile_in) { if ((pszFile_in != NULL) && !isMemberOfStringList(pFiles_m, pszFile_in)) { pFiles_m = mergeIntoStringList(pFiles_m, pszFile_in); ... } }
`membr_sl.c'
#include "opaclib.h" char * isolateWord(char * pszLine_io);
isolateWord
isolates the "word" pointed to by its argument by
replacing the first whitespace character following the word with a NUL
character. It then steps the pointer to the beginning of the next
"word" in the input string.
isolateWord
skips over any leading whitespace in the input string
before trying to isolate a "word".
isolateWord
has one argument:
pszLine_io
NUL
-terminated character string.
a pointer to the first character of the next following word, which may be
the NUL
character at the end of the input string
5.44.4 Example
#include <string.h> #include "opaclib.h" /* includes strlist.h */ ... StringList * pTraceMorphs_m = NULL; ... void addTraceMorphs(char * pszLine_in) { char * pszMorph; char * pszEnd; if (pszLine_in == NULL) return; for ( pszMorph = pszLine_in + strspn(pszLine_in, " \r\n\t\f\v"); *pszMorph_in ; pszMorph = pszEnd ) { pszEnd = isolateWord( pszMorph ); /* isolate the morpheme */ if (strcmp(pszMorph, "0") == 0) /* If 0, put in NUL */ *pszMorph = NUL; pTraceMorphs_m = mergeIntoStringList(pTraceMorphs_m, pszMorph); } }
`isolatew.c'
#include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ int isStringClassMember(const char * pszString_in, const StringClass * pClass_in);
isStringClassMember
searches a string class for a specific
string.
The arguments to isStringClassMember
are as follows:
pszString_in
pClass_in
nonzero (TRUE) if the string is found in the class, otherwise zero (FALSE)
5.45.4 Example
#include "strclass.h" ... static StringClass * pClasses_m; ... int isClassMember(const char * pszString_in, const char * pszClassName_in) { StringClass * pClass; pClass = findStringClass(pszClassName_in, pClasses_m); if (pClass == NULL) return 0; return isStringClassMember(pszString_in, pClass); }
`strcla.c'
#include "textctl.h" /* or template.h or opaclib.h */ int loadIntxCtlFile(const char * pszFilename_in, int cComment_in, TextControl * pTextCtl_out, StringClass ** ppStringClasses_io);
loadIntxCtlFile
loads a text input control file into memory.
This is a standard format file containing one data record with the
following fields (not necessarily in this order):
\ambig
\ambig
field is optional, and may occur only once.
\barchar
|
) in
the S.I.L. Manuscripter program in the early 1980's. The
\barchar
field is optional, and may occur only once. An empty
field disables this feature.
\barcodes
\barchar
character
to form formatting commands. Whitespace (spaces, tabs, or newlines) in
this field is optional. The \barcodes
field is optional, and
may occur any number of times. Its effect is cumulative.
\ch
\ch
field is optional, and may
occur any number of times. An ordered list of consistent changes is
built by the function. Each change is applied to each input word as
many times as necessary before the next change is applied.
\dsc
\dsc
field is optional, and may occur only once.
\excl
\format
field in the text input
control file. The \excl
field lists one or more field codes
(formatting commands) complete with the leading \format
character. Field codes are separated by whitespace (spaces, tabs, or
newlines). The \excl
field is optional, and may occur any
number of times. Its effect is cumulative. If any \excl
fields
occur, then no \incl
fields are allowed, and all fields in the
input file that are not explicitly listed in a \excl
field will
be processed.
\format
\format
field is optional, and may occur only once.
\incl
\format
field in the text input control
file. The \incl
field lists one or more field codes (formatting
commands) complete with the leading \format
character. Field
codes are separated by whitespace (spaces, tabs, or newlines). The
\incl
field is optional, and may occur any number of times. Its
effect is cumulative. If any \incl
fields occur, then no
\excl
fields are allowed, and only those fields in the input
file that are explicitly listed in a \incl
field will be
processed.
\luwfc
\luwfc
field is optional, and may occur
any number of times. Its effect is cumulative. For lowercase and
uppercase forms that are represented by two or more adjacent characters
(bytes), use the \luwfcs
field described below.
\luwfcs
\luwfcs
field is optional, and may occur any
number of times. Its effect is cumulative. Note that \luwfcs
fields may be used to replace \luwfc
fields, or the two types of
fields may be mixed together in the control file.
The implementation underlying the \luwfcs
field does not require
that the lowercase and uppercase forms occupy the same number of
characters (bytes).
\maxdecap
\maxdecap
field is
optional, and may occur only once.
\nocap
\luwfc
and \luwfcs
fields
should not be used. The \nocap
field is optional, and may occur
only once.
\noincap
\noincap
field is optional, and may
occur only once.
\scl
\scl
field is
optional, and any number of string classes may be defined. A string
class definition must occur before any \ch
field that uses that
string class.
\wfc
\wfc
field is
optional, and may occur any number of times. Its effect is cumulative.
For caseless forms that are represented by two or more adjacent characters
(bytes), use the \wfcs
field described below.
\wfcs
\wfcs
field is optional, and may occur
any number of times. Its effect is cumulative. Note that \wfcs
fields may be used to replace \wfc
fields, or the two types of
fields may be mixed together in the control file.
For more details about this file, see section `Text Input Control File' in AMPLE Reference Manual.
The arguments to loadIntxCtlFile
are as follows:
pszFilename_in
cComment_in
pTextCtl_out
ppStringClasses_io
\ch
fields or added to by \scl
fields.
zero if successful, nonzero if an error occurs
5.46.4 Example
#include <stdio.h> #include "textctl.h" /* includes strclass.h */ #include "rpterror.h" ... char szIntxFilename_g[200]; TextControl sTextControl_g; StringClass * pStringClasses_g = NULL; static TextControl sDefaultTextControl_m = { NULL, /* filename */ NULL, /* ordered array of lowercase letters */ NULL, /* ordered array of matching uppercase letters */ NULL, /* array of caseless letters */ NULL, /* list of input orthography changes */ NULL, /* list of output (orthography) changes */ NULL, /* list of format markers (fields) to include */ NULL, /* list of format markers (fields) to exclude */ '\\', /* initial character of format markers (field codes) */ '%', /* character for marking ambiguities and failures */ '-', /* character for marking decomposition */ '|', /* initial character of secondary format markers */ NULL, /* (Manuscripter) bar codes */ TRUE, /* flag whether to capitalize individual letters */ TRUE, /* flag whether to decapitalize/recapitalize */ 100 /* maximum number of decapitalization alternatives */ }; ... memcpy(&sTextControl_g, &sDefaultTextControl_m, sizeof(TextControl)); fprintf(stderr, "Text Control File (xxINTX.CTL) [none]: "); fgets( szIntxFilename_g, 200, stdin ); if (szIntxFilename_g[0]) { if (loadIntxCtlFile(szIntxFilename_g, ';', sTextControl_g, pStringClasses_g) != 0) { reportError(ERROR_MSG, "Error reading text control file %s\n", szIntxFilename_g); } } if ( (sTextControl_g.cBarMark == NUL) && (sTextControl_g.pszBarCodes != NULL) ) { freeMemory(sTextControl_g.pszBarCodes); sTextControl_g.pszBarCodes = NULL; } if ( (sTextControl_g.cBarMark != NUL) && (sTextControl_g.pszBarCodes == NULL) ) { sTextControl_g.pszBarCodes = (unsigned char *)duplicateString( "bdefhijmrsuvyz"); }
`loadintx.c'
#include "textctl.h" /* or template.h or opaclib.h */ int loadOutxCtlFile(const char * pszFilename_in, int cComment_in, TextControl * pTextCtl_out, StringClass ** ppStringClasses_io);
loadOutxCtlFile
loads a text output control file into memory.
This is a standard format file containing one data record with the
following fields (not necessarily in this order):
\ambig
\ambig
field is optional, and may occur only
once.
\ch
\ch
field is optional, and may occur any number of times. An
ordered list of consistent changes is built by the function. Each
change is applied to each output word as many times as necessary before
the next change is applied.
\dsc
\dsc
field is optional, and may occur only once.
\format
\format
field is optional, and may occur only once.
\luwfc
\luwfc
field is optional, and may occur
any number of times. Its effect is cumulative. For lowercase and
uppercase forms that are represented by two or more adjacent characters
(bytes), use the \luwfcs
field described below.
\luwfcs
\luwfcs
field is optional, and may occur any
number of times. Its effect is cumulative. Note that \luwfcs
fields may be used to replace \luwfc
fields, or the two types of
fields may be mixed together in the control file.
The implementation underlying the \luwfcs
field does not require
that the lowercase and uppercase forms occupy the same number of
characters (bytes).
\scl
\scl
field is
optional, and any number of string classes may be defined. A string
class definition must occur before any \ch
field that uses that
string class.
\wfc
\wfc
field is
optional, and may occur any number of times. Its effect is cumulative.
For caseless forms that are represented by two or more adjacent characters
(bytes), use the \wfcs
field described below.
\wfcs
\wfcs
field is optional, and may occur
any number of times. Its effect is cumulative. Note that \wfcs
fields may be used to replace \wfc
fields, or the two types of
fields may be mixed together in the control file.
Note that these are only a subset of the fields allowed in a text input control file. For more details about this file, see section `The text output control file' in KTEXT Reference Manual.
The arguments to loadOutxCtlFile
are as follows:
pszFilename_in
cComment_in
pTextCtl_out
ppStringClasses_io
\ch
fields or added to by \scl
fields.
zero if successful, nonzero if an error occurs
5.47.4 Example
#include <stdio.h> #include "textctl.h" /* includes strclass.h */ #include "rpterror.h" ... char szOutxFilename_g[200]; TextControl sOutputControl_g; StringClass * pStringClasses_g = NULL; ... memset(&sOutputControl_g, 0, sizeof(TextControl)); fprintf(stderr, "Text Output Control File (xxOUTX.CTL) [none]: "); fgets(szOutxFilename_g, 200, stdin); if (szOutxFilename_g[0]) { if (loadOutxCtlFile(szOutxFilename_g, ';', sOutputControl_g, pStringClasses_g) != 0) { reportError(ERROR_MSG, "Error reading text output control file %s\n", szOutxFilename_g); } }
`loadoutx.c'
#include "textctl.h" /* or template.h or opaclib.h */ int matchAlphaChar(const unsigned char * pszString_in, const TextControl * pTextCtl_in);
matchAlphaChar
checks whether the input string begins with a
multibyte alphabetic (word formation) character. If so, it returns the
number of bytes in the matched multibyte alphabetic character.
This function depends on previous calls to addWordFormationChars
,
addWordFormationCharStrings
, addLowerUpperWFChars
, and
addLowerUpperWFCharStrings
to establish the multibyte alphabetic
characters. (These functions are implicitly called by
loadIntxCtlFile
and loadOutxCtlFile
.)
The arguments to matchAlphaChar
are as follows:
pszString_in
pTextCtl_in
the number of bytes occupied by the multibyte alphabetic character at the beginning of the input string, or zero if the the string does not begin with a multibyte alphabetic character
5.48.4 Example See section 5.14 convLowerToUpper.
5.48.5 Source File `myctype.c'
#include "opaclib.h" int matchBeginning(const char * pszString_in, const char * pszBegin_in);
matchBeginning
compares two strings, using the end of the second
string as the cutoff point for the comparison. It is functionally
equivalent to
(strncmp(pszString_in, pszBegin_in, strlen(pszBegin_in)) == 0)
The arguments to matchBeginning
are as follows:
pszString_in
pszBegin_in
nonzero (TRUE) if the two strings are equal up to the end of the second string, otherwise zero (FALSE)
5.49.4 Example
#include "opaclib.h" ... char string[100], match[50]; ... if (matchBeginning(string, match)) { ... }
`matchbeg.c'
#include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ size_t matchBeginWithStringClass(const char * pszString_in, const StringClass * pClass_in);
matchBeginWithStringClass
searches a string class to find a
class member that matches the beginning of a string. It stops at the
first successful match.
The arguments to matchBeginWithStringClass
are as follows:
pszString_in
pClass_in
the length of the first successful match if found (effectively TRUE), otherwise zero (FALSE)
5.50.4 Example
#include "strclass.h" ... static StringClass * pClasses_m; ... int matchesClassMemberAtBeginning(const char * pszString_in, const char * pszClassName_in) { StringClass * pClass; pClass = findStringClass(pszClassName_in, pClasses_m); if (pClass == NULL) return 0; return matchBeginWithStringClass(pszString_in, pClass); }
`strcla.c'
#include "textctl.h" int matchCaselessChar(const unsigned char * pszString_in, const TextControl * pTextCtl_in);
matchCaselessChar
checks whether the input string begins with a
multibyte caseless character. If so, it returns the number of bytes in
the matched multibyte caseless character.
This function depends on previous calls to addWordFormationChars
or addWordFormationCharStrings
to establish the multibyte caseless
characters. (addWordFormationChars
and
addWordFormationCharStrings
are implicitly called by
loadIntxCtlFile
and loadOutxCtlFile
.)
The arguments to matchCaselessChar
are as follows:
pszString_in
pTextCtl_in
the number of bytes occupied by the multibyte caseless character at the beginning of the input string, or zero if the the string does not begin with a multibyte caseless character
5.51.4 Example See section 5.54 matchLowercaseChar.
5.51.5 Source File `myctype.c'
#include "opaclib.h" int matchEnd(const char * pszString_in, const char * pszTail_in);
matchEnd
compares the second string against the end of the
first string. It is functionally equivalent to
((strlen(pszString_in) < strlen(pszTail_in)) ? 0 : (strcmp(pszString_in + strlen(pszString_in) - strlen(pszTail_in), pszTail_in) == 0))
The arguments to matchEnd
are as follows:
pszString_in
pszTail_in
nonzero (TRUE) if the second string matches the end of the first string, otherwise zero (FALSE)
5.52.4 Example
#include "opaclib.h" ... char string[100], match[50]; ... if (matchEnd(string, match)) { ... }
`matchend.c'
#include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ size_t matchEndWithStringClass(const char * pszString_in, const StringClass * pClass_in);
matchEndWithStringClass
searches a string class to find a class
member that matches the end of a string. It stops at the first
successful match.
The arguments to matchEndWithStringClass
are as follows:
pszString_in
pClass_in
the length of the first successful match if found (effectively TRUE), otherwise zero (FALSE)
5.53.4 Example
#include "strclass.h" ... static StringClass * pClasses_m; ... int matchesClassMemberAtEnd(const char * pszString_in, const char * pszClassName_in) { StringClass * pClass; pClass = findStringClass(pszClassName_in, pClasses_m); if (pClass == NULL) return 0; return matchEndWithStringClass(pszString_in, pClass); }
`strcla.c'
#include "textctl.h" /* or template.h or opaclib.h */ int matchLowercaseChar(const unsigned char * pszString_in, const TextControl * pTextCtl_in);
matchLowercaseChar
checks whether the input string begins with a
multibyte lowercase character. If so, it returns the number of bytes in
the matched multibyte lowercase character.
This function depends on previous calls to addLowerUpperWFChars
or
addLowerUpperWFCharStrings
to establish the multibyte lowercase
characters. (addLowerUpperWFChars
and
addLowerUpperWFCharStrings
are implicitly called by
loadIntxCtlFile
and loadOutxCtlFile
.)
The arguments to matchLowercaseChar
are as follows:
pszString_in
pTextCtl_in
the number of bytes occupied by the multibyte lowercase character at the beginning of the input string, or zero if the the string does not begin with a multibyte lowercase character
5.54.4 Example
#include "textctl.h" #define CASELESS -1 #define NOCAP 0 #define INITCAP 1 #define ALLCAP 2 #define MIXCAP 3 int getWordCase(const unsigned char * pszWord_in, const TextControl * pTextCtl_in) { unsigned uiUpperCount = 0; unsigned uiLowerCount = 0; int bFirstCap = 0; int iLength; unsigned char * p; for ( p = pszWord_in ; p && *p ; p += iLength ) { iLength = matchLowercaseChar(p, pTextCtl_in); if (iLength != 0) ++uiLowerCount; else { iLength = matchUppercaseChar(p, pTextCtl_in); if (iLength != 0) { ++uiUpperCount; if (uiLowerCount == 0) bFirstCap = 1; } else { iLength = matchCaselessChar(p, pTextCtl_in); if (iLength == 0) iLength = 1; } } } if ((uiUpperCount == 0) && (uiLowerCount == 0)) return CASELESS; else if (uiUpperCount == 0) return NOCAP; else if (bFirstCap && (uiUpperCount == 1)) return INITCAP; else if (uiLowerCount == 0) return ALLCAP; else return MIXCAP; }
`myctype.c'
#include "textctl.h" /* or template.h or opaclib.h */ int matchUppercaseChar(const unsigned char * pszString_in, const TextControl * pTextCtl_in);
matchUppercaseChar
checks whether the input string begins with a
multibyte uppercase character. If so, it returns the number of bytes in
the matched multibyte uppercase character.
This function depends on previous calls to addLowerUpperWFChars
or
addLowerUpperWFCharStrings
to establish the multibyte uppercase
characters. (addLowerUpperWFChars
and
addLowerUpperWFCharStrings
are implicitly called by
loadIntxCtlFile
and loadOutxCtlFile
.)
The arguments to matchUppercaseChar
are as follows:
pszString_in
pTextCtl_in
the number of bytes occupied by the multibyte lowercase character at the beginning of the input string, or zero if the the string does not begin with a multibyte lowercase character
5.55.4 Example See section 5.54 matchLowercaseChar.
5.55.5 Source File `myctype.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * mergeIntoStringList(StringList * pList_io, const char * pszString_in);
mergeIntoStringList
adds a string to the beginning of a list of
strings if it is not already present in the list.
The arguments to mergeIntoStringList
are as follows:
pList_io
pszString_in
duplicateString
is stored in the list, not the original string
itself.
a pointer to the possibly modified list of strings
5.56.4 Example
#include "strlist.h" ... StringList * pStrings = NULL; ... pStrings = mergeIntoStringList(pStrings, "this"); /* pStrings-->"this"-->NULL */ pStrings = mergeIntoStringList(pStrings, "test"); /* pStrings-->"test"-->"this"-->NULL */ pStrings = mergeIntoStringList(pStrings, "is"); /* pStrings-->"is"-->"test"-->"this"-->NULL */ pStrings = mergeIntoStringList(pStrings, "a"); /* pStrings-->"a"-->"is"-->"test"-->"this"-->NULL */ pStrings = mergeIntoStringList(pStrings, "test"); /* pStrings-->"a"-->"is"-->"test"-->"this"-->NULL */
`add_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * mergeIntoStringListAtEnd(StringList * pList_io, const char * pszString_in);
mergeIntoStringListAtEnd
adds a string to the end of a list of
strings if it is not already present in the list.
The arguments to mergeIntoStringListAtEnd
are as follows:
pList_io
pszString_in
duplicateString
is stored in the list, not the original string
itself.
a pointer to the possibly modified list of strings
5.57.4 Example
#include "strlist.h" ... StringList * pStrings = NULL; ... pStrings = mergeIntoStringListAtEnd(pStrings, "this"); /* pStrings-->"this"-->NULL */ pStrings = mergeIntoStringListAtEnd(pStrings, "test"); /* pStrings-->"this"-->"test"-->NULL */ pStrings = mergeIntoStringListAtEnd(pStrings, "is"); /* pStrings-->"this"-->"test"-->"is"-->NULL */ pStrings = mergeIntoStringListAtEnd(pStrings, "a"); /* pStrings-->"this"-->"test"-->"is"-->"a"-->NULL */ pStrings = mergeIntoStringListAtEnd(pStrings, "test"); /* pStrings-->"this"-->"test"-->"is"-->"a"-->NULL */
`appnd_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * mergeTwoStringLists(StringList * pFirstList_io, StringList * pSecondList_io);
mergeTwoStringLists
merges two lists of strings together to form
a single list. Any strings in the second list that exist in the first
list are freed. Neither of the original lists survives this operation.
The arguments to mergeTwoStringLists
are as follows:
pFirstList_io
pSecondList_io
a pointer to the merged list
5.58.4 Example
#include "strlist.h" ... StringList * pStrings = NULL; StringList * pStrings1 = NULL; StringList * pStrings2 = NULL; ... pStrings1 = mergeIntoStringListAtEnd(pStrings1, "this"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "test"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "is"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "a"); pStrings1 = mergeIntoStringListAtEnd(pStrings1, "test"); pStrings2 = mergeIntoStringList(pStrings2, "that"); pStrings2 = mergeIntoStringList(pStrings2, "test"); pStrings2 = mergeIntoStringList(pStrings2, "is"); pStrings2 = mergeIntoStringList(pStrings2, "good"); /* pStrings1-->"this"-->"test"-->"is"-->"a"-->NULL */ /* pStrings2-->"good"-->"is"-->"test"-->"that"-->NULL */ pStrings = mergeTwoStringLists(pStrings1, pStrings2); /* pStrings-->"good"-->"that"-->"this"-->"test"-->"is"-->"a"-->NULL */ /* pStrings1-->-----------------^ */ /* pStrings2-->??? */
`cat_sl.c'
#include "change.h" /* or textctl.h or template.h or opaclib.h */ Change * parseChangeString(const char * pszString_in, const StringClass * pClassList_in);
parseChangeString
parses a string to build a Change
structure.
The arguments to parseChangeString
are as follows:
pszString_in
pClasses_in
a pointer to a newly allocated Change structure, or NULL
if an
error occurred while parsing the change definition
5.59.4 Example
#include "change.h" /* includes strclass.h */ ... Change * addChange(const char * pszChange_in, Change * pChanges_io, const StringClass * pClasses_in) { Change * pChange; Change * pTail; pChange = parseChangeString(pszChange_in, pClasses_in); if (pChange != NULL) { if (pChanges_io == NULL) return pChange; /* * keep the list of changes in the original order */ for (pTail = pChanges_io ; pTail->pNext ; pTail = pTail->pNext) ; pTail->pNext = pChange; } return pChanges_io; }
`change.c'
#include "opaclib.h" void promptUser(const char * pszPrompt_in, char * pszBuffer_out, unsigned uiBufferSize_in);
promptUser
prompts the user, then reads a line of input from the
keyboard (normally the standard input). If an EOF
occurs,
promptUser
tries to reopen the keyboard.
The arguments to promptUser
are as follows:
pszPrompt_in
pszBuffer_out
uiBufferSize_in
NUL
).
none
5.60.4 Example
#include <stdio.h> #include "opaclib.h" ... char szFilename_g[BUFSIZ+1]; FILE * pInputFP_g; char szBuffer_g[17]; long iRepeatCount_g; ... promptUser("Data file: ", szFilename_g, BUFSIZ); pInputFP_g = fopen(szFilename_g, "r"); ... promptUser("Number of iterations to perform: ", szBuffer_g, 16); iRepeatCount_g = strtol(szBuffer_g, NULL, 10);
`promptus.c'
#include "opaclib.h" char * readLineFromFile(FILE * pInputFP_in, unsigned * puiLineNumber_io, int cComment_in);
readLineFromFile
reads an arbitrarily long line of input text,
erasing the trailing newline character. The string returned is
overwritten or freed at the next call to readLineFromFile
.
The arguments to readLineFromFile
are as follows:
pInputFP_in
FILE
pointer.
puiLineNumber_io
NULL
.
cComment_in
the address of the buffer containing the NUL
-terminated line, or
NULL
if already at the end of the file
5.61.4 Example
#include <stdio.h> #include <string.h> #include "opaclib.h" void processFile(const char * pszFilename_in) { FILE * pInputFP; unsigned uiLineNumber; char * pszLine; if (pszFilename_in == NULL) return; pInputFP = fopen(pszFilename_in, "r"); if (pInputFP == NULL) return; uiLineNumber = 1; while ((pszLine = readLineFromFile(pInputFP, &uiLineNumber, ';')) != NULL) { ... } printf("%u lines read from %s\n", uiLineNumber, pszFilename_in); }
`readline.c'
#include "template.h" /* or opaclib.h */ WordTemplate ** readSentenceOfTemplates(FILE * pInputFP_in, const char * pszAnaFile_in, const char * pszFinalPunct_in, TextControl * pTextCtl_in, FILE * pLogFP_in)
readSentenceOfTemplates
reads an arbitrarily long sentence
(sequence of words) from an input analysis file, building an array of
WordTemplate
data structures. The sentence is terminated by a
sentence-final punctuation character from pszFinalPunct_in
.
The arguments to readSentenceOfTemplates
are as follows:
pInputFP_in
FILE
pointer.
pszAnaFile_in
pszFinalPunct_in
NUL
-terminated string of punctuation characters that
mark the end of a sentence.
pTextCtl_in
pLogFP_in
FILE
pointer, used to log error messages, or
NULL
.
a pointer to a dynamically allocated NULL
-terminated array of
pointers to dynamically allocated WordTemplate
structures
5.62.4 Example
#include <stdio.h> #include "template.h" #include "allocmem.h" #include "rpterror.h" ... TextControl sTextControl_g; static const char szSentenceFinalPunc_m[] = ".!?"; static const char szCannotOpen_m[] = "Warning: cannot open analysis input file %s\n"; ... void processSentences(char * pszAnaFile_in, FILE * pLogFP_in) { FILE * pInputFP; WordTemplate ** pSentence; unsigned uiSentenceCount; unsigned i; ... pInputFP = fopen(pszAnaFile_in, "r"); if (pInputFP == NULL) { reportError(ERROR_MSG, szCannotOpen_m, pszAnaFile_in); if (pLogFP_in != NULL) fprintf(pLogFP_in, szCannotOpen_m, pszAnaFile_in); return 0; } for ( uiSentenceCount = 0 ;; ++uiSentenceCount ) { pSentence = readSentenceOfTemplates(pInputFP, pszAnaFile_in, szSentenceFinalPunc_m, &sTextControl_g, pLogFP_in); if (pSentence == NULL) break; ... for ( i = 0 ; pSentence[i] ; ++i ) freeWordTemplate( pSentence[i] ); freeMemory( pSentence ); } return uiSentenceCount; }
`senttemp.c'
#include "opaclib.h" char ** readStdFormatField(FILE * pInputFP_in, const char ** ppszFieldCodes_in, int cComment_in);
readStdFormatField
reads an arbitrarily large text field that
starts with a backslash marker at the beginning of a line. Each line
of the input field is stored separately in a NULL
-terminated
array of strings. If the field code at the beginning matches one of
those in the input array of field codes, it is replaced by a single
byte containing the 1-based index of the matching field code.
Otherwise, the field code is left intact except that the backslash
character is replaced by the character code 255
('\377'
).
This function is an alternative to readStdFormatRecord
, which
potentially reads several fields at a time.
The arguments to readStdFormatField
are as follows:
pInputFP_in
FILE
pointer.
ppszFieldCodes_in
NULL
-terminated array of field code strings.
cComment_in
a pointer to a dynamically allocated NULL
-terminated array of
pointers to dynamically allocated lines of text
5.63.4 Example
#include <stdio.h> #include "opaclib.h" ... static char szWhitespace_m[7] = " \t\r\n\f\v"; ... int read_control_file(char * pszControlFile_in) { int i; char * pszRuleFile = NULL; char * pszLexiconFile = NULL; char * pszGrammarFile = NULL; StringList * pTraceList = NULL; char * pszMorph; FILE * pControlFP; char ** ppszField; char * pszLine; static char * aszCodes_s[] = { "\\rules", "\\lexicon", "\\grammar", "\\trace", ..., NULL }; if (pszControlFile_in == NULL) return FALSE; pControlFP = fopen(pszControlFile_in, "r"); if (pControlFP == (FILE *)NULL) { reportError(WARNING_MSG, "Cannot open control file %s\n", pszControlFile_in); return FALSE; } for (;;) { ppszField = readStdFormatField(pControlFP, aszCodes_s, NUL)); if (ppszField == NULL) break; switch (**ppszField) { case 1: /* "\\rules" */ if (pszRuleFile != NULL) reportError(WARNING_MSG, "Rule file already specified: %s\n", pszRuleFile); else { for ( i = 0 ; ppszField[i] ; ++i ) { pszLine = ppszField[i]; if (i == 0) ++pszLine; pszRuleFile = strtok(pszLine, szWhitespace_m); if (pszRuleFile != NULL) break; } } break; case 2: /* "\\lexicon" */ if (pszLexiconFile != NULL) reportError(WARNING_MSG, "Lexicon file already specified: %s\n", pszLexiconFile); else { for ( i = 0 ; ppszField[i] ; ++i ) { pszLine = ppszField[i]; if (i == 0) ++pszLine; pszLexiconFile = strtok(pszLine, szWhitespace_m); if (pszLexiconFile != NULL) break; } } break; case 3: /* "\\grammar" */ if (pszGrammarFile != NULL) reportError(WARNING_MSG, "Grammar file already specified: %s\n", pszGrammarFile); else { for ( i = 0 ; ppszField[i] ; ++i ) { pszLine = ppszField[i]; if (i == 0) ++pszLine; pszGrammarFile = strtok(pszLine, szWhitespace_m); if (pszGrammarFile != NULL) break; } } break; case 4: /* "\\trace" */ for ( i = 0 ; ppszField[i] ; ++i ) { pszLine = ppszField[i]; if (i == 0) ++pszLine; for ( pszMorph = strtok(pszLine, szWhitespace_m) ; pszMorph ; pszMorph = strtok(NULL, szWhitespace_m) { pTraceList = mergeIntoStringList(pTraceList, pszMorph); } } break; ... default: reportError(WARNING_MSG, "Unknown field: \\%s\n", ppszField[0] + 1); break; } for ( i = 0 ; ppszField[i] ; ++i ) freeMemory(ppszField[i]); freeMemory(ppszField); } fclose(pControlFP); ... return TRUE; }
`readfiel.c'
#include "record.h" /* or opaclib.h */ char * readStdFormatRecord(FILE * pInputFP_in, const CodeTable * pCodeTable_in, int cComment_in, unsigned * puiRecordCount_io);
readStdFormatRecord
reads the next record from a standard format
file. The record is stored in memory as a series of
NUL
-terminated strings stored consecutively in a single buffer,
with the record terminated by two consecutive NUL
bytes. The
first character of each string is either a character representing the
field code (if found in the code table), or a backslash indicating that
the field code was not recognized.
This function is an alternative to readStdFormatField
, which
always reads only one field at a time.
The arguments to readStdFormatRecord
are as follows:
pInputFP_in
FILE
pointer.
pCodeTable_in
cComment_in
puiRecordCount_io
NULL
.
a pointer to the buffer containing the record, or NULL
for
EOF
.
5.64.4 Example
#include <stdio.h> #include <string.h> #include "record.h" ... void loadStdFmtFile(pszFilename_in) char * pszFilename_in; { FILE * pInputFP; char * pRecord; char * pszField; char * pszNextField; unsigned uiRecordCount; static CodeTable sCodeTable_s = { "\ \\a\0A\0\ \\d\0D\0\ \\w\0W\0\ \\f\0F\0\ \\c\0C\0\ \\n\0N\0" 6, "\\a" }; if (pszFilename_in == NULL) return; pInputFP = fopen(pszFilename_in, "r"); if (pInputFP == NULL) return; while ((pRecord = readStdFormatRecord(pInputFP, &sCodeTable_s, ';', &uiRecordCount)) != NULL) { pszField = pRecord; while ((c = *pszField++) != '\0') { pszNextField = pszField + strlen(pszField) + 1; switch (c) { case 'A': ... break; case 'C': ... break; case 'D': ... break; case 'F': ... break; case 'N': ... break; case 'W': ... break; default: ... break; } pszField = pszNextField; } ... } cleanupAfterStdFormatRecord(); fclose(pInputFP); return; }
`record.c'
#include "template.h" /* or opaclib.h */ WordTemplate * readTemplateFromAnalysis( FILE * pInputFP_in, const TextControl * pTextCtl_in);
readTemplateFromAnalysis
fills in a WordTemplate
data
structure from an AMPLE style analysis file.
The arguments to readTemplateFromAnalysis
are as follows:
pInputFP_in
FILE
pointer.
pTextCtl_in
a pointer to a dynamically allocated WordTemplate
data
structure, or NULL
if either EOF
or an error occurs
5.65.4 Example
#include "template.h" #include "rpterror.h" ... void synthesizeFile( char * pszInputFile_in, char * pszOutputFile_in, TextControl * pTextCtl_in) { FILE * pInputFP; FILE * pOutputFP; WordTemplate * pWord; WordAnalysis * pAnal; ... /* * open the files */ if ((pszInputFile_in == NULL) || (pszOutputFile_in == NULL)) return; pInputFP = fopen(pszInputFile_in, "r"); if (pInputFP == NULL) { reportError(WARNING_MSG, "Cannot open input file %s\n", pszInputFile_in); return; } pOutputFP = fopen(pszOutputFile_g, "w"); if (pOutputFP == NULL) { reportError(WARNING_MSG, "Cannot open output file %s\n", pszOutputFile_in); fclose(pInputFP); return; } /* * process the data */ for (;;) { pWord = readTemplateFromAnalysis(pInputFP, &pTextCtl_in); if (pWord == NULL) break; ... for ( pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext ) { ... } ... writeTextFromTemplate( pOutputFP, pWord, pTextCtl_in); freeWordTemplate( pWord ); } ... fclose(pInputFP); fclose(pOutputFP); }
`dtbin.c'
#include "template.h" /* or opaclib.h */ WordTemplate * readTemplateFromText(FILE * pInputFP_in, const TextControl * pTextCtl_in);
readTemplateFromText
reads a word from a text file into a
WordTemplate
structure.
The arguments to readTemplateFromText
are as follows:
pInputFP_in
FILE
pointer.
pTextCtl_in
a pointer to a dynamically allocated WordTemplate
data
structure, or NULL
if either EOF
or an error occurs
5.66.4 Example See section 5.38 freeWordTemplate.
5.66.5 Source File `textin.c'
#include "template.h" /* or opaclib.h */ WordTemplate * readTemplateFromTextString(unsigned char ** ppszString_io, const TextControl * pTextCtl_in);
readTemplateFromText
reads a word from a text string into a
WordTemplate
structure.
The arguments to readTemplateFromText
are as follows:
ppszString_io
pTextCtl_in
a pointer to a dynamically allocated WordTemplate
data
structure, or NULL
if either the string consists merely of
NUL
or an error occurs
5.67.4 Example
#include "template.h" ... TextControl sTextCtl_g; ... WordAnalysis * merge_analyses( WordAnalysis * pList_in, WordAnalysis * pAnal_in) { ... } ... void process( unsigned char *pszInputText_in, FILE * pOutputFP_in) { char * pszInputText; char * pszWord; WordTemplate * pWord; WordAnalysis * pAnal; unsigned uiAmbiguityCount; unsigned long uiWordCount; pszInputText = duplicateString(pszInputText_in); pszWord = pszInputText; for ( uiWordCount = 0L ;; ) { pWord = readTemplateFromTextString(&pszWord, &sTextCtl_g); if (pWord == NULL) break; uiAmbiguityCount = 0; if (pWord->paWord != NULL) { for ( i = 0 ; pWord->paWord[i] ; ++i ) { pAnal = analyze(pWord->paWord[i]); pWord->pAnalyses = merge_analyses(pWord->pAnalyses, pAnal); } for (pAnal = pWord->pAnalyses ; pAnal ; pAnal = pAnal->pNext) ++uiAmbiguityCount; } writeTemplate(pOutputFP_in, NULL, pWord, &sTextCtl_g); freeWordTemplate(pWord); } freeMemory(pszInputText); }
`textin.c'
#include "allocmem.h" /* or opaclib.h */ void * reallocMemory(void * pBuffer_in, size_t uiSize_in);
reallocMemory
adjusts an allocated buffer to a new size.
It provides a "safe" interface to either realloc
or
malloc
, depending on whether or not pBuffer_in
is
NULL
. Running out of memory is handled the same as for
allocMemory
; see
section 5.8 allocMemory.
The arguments to reallocMemory
are as follows:
pBuffer_in
allocMemory
, reallocMemory
, or duplicateString
.
It also may be NULL
to allocate a new block of memory.
uiSize_in
a pointer to a possibly reallocated block
5.68.4 Example See section 5.29 fitAllocStringExactly.
5.68.5 Source File `allocmem.c'
#include "template.h" /* or opaclib.h */ void recapitalizeWord(char * pszWord_io, int iRecap_in, const TextControl * pTextCtl_in);
recapitalizeWord
tries to reimpose capitalization as it was in
the original input text.
The arguments to recapitalizeWord
are as follows:
pszWord_io
iRecap_in
0 (NOCAP)
1 (INITCAP)
2 (ALLCAP)
4-65535
4
encoding the capitalization of the first character, 8
encoding the second character, and so on.
pTextCtl_in
none
5.69.4 Example
#include "template.h" void fix_new_words(pTemplate_io, pTextCtl_in) WordTemplate * pTemplate_io; const TextControl * pTextCtl_in; { StringList * pWord; char * p; if ((pTemplate_io == NULL) || (pTemplate_io->pNewWords == NULL)) return; if (pTextCtl_in == NULL) return; /* * apply orthography changes to the word and recapitalize it */ for ( pWord = pTemplate_io->pNewWords ; pWord ; pWord = pWord->pNext ) { /* * apply output orthography changes and recapitalize */ p = applyChanges(pWord->pszString, pTextCtl_in->pOutputChanges ); recapitalizeWord( p, pTemplate_io->iCapital, pTextCtl_in); /* * store the modified wordform */ freeMemory(pWord->pszString); pWord->pszString = p; } }
`textout.c'
#include "trie.h" /* or opaclib.h */ int removeDataFromTrie(Trie * pTrieHead_in, char * pszKey_in, void * pInfo_in, void * (* pfRemoveInfo_in)(void * pOld_in, void * pList_io));
removeDataFromTrie
removes a stored piece of information from a
trie.
The arguments to removeDataFromTrie
are as follows:
pTrieHead_in
pszKey_in
pInfo_in
pfRemoveInfo_in
pOld_in
pInfo_in
).
pList_io
Trie
node
(Trieinfo
).
pTrieInfo
.
zero if successful, nonzero if an error occurs
5.70.4 Example
#include <string.h> #include "trie.h" #include "rpterror.h" #include "allocmem.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next item */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; unsigned long uiLexiconCount_g; static char szWhitespace_m[7] = " \t\r\n\f\v"; ... static void * remove_lex_item(void * pDefunct_in, void * pList_in) { LexItem * pLex; LexItem * pList; /* * be a little paranoid */ if (pDefunct_in == NULL) return pList_in; /* * handle removing the head of the list */ if (pDefunct_in == pList_in) return pDefunct_in->pLink; /* * unlink from the list of homographs */ /* * unlink from both the general list and the list of homographs */ for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink ) { if (pLex->pNext == pDefunct_in) pLex->pNext = pDefunct_in->pNext; if (pLex->pLink == pDefunct_in) { pLex->pLink = pDefunct_in->pLink; break; /* no need to check further */ } } return pList_in; } void remove_from_lexicon(char * pszForm_in, char * pszGloss_in, char * pszCategory_in) { LexItem * pLex; unsigned short uiCategory; if ( (pszForm_in == NULL) || (pszGloss_in == NULL) || (pszCategory_in == NULL) ) return; uiCategory = index_lexical_category(pszCategory_in); for ( pLex = findDataInTrie(pLexicon_g, pszWord_in) ; pLex ; pLex = pLex->pLink ) { if ( (strcmp(pLex->pszForm, pszWord_in) == 0) && (strcmp(pLex->pszGloss, pszGloss_in) == 0) && (pLex->uiCategory == uiCategory) ) { removeDataFromTrie(pLexicon_g, pszForm_in, pLex, remove_lex_item); break; } } }
`trie.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * removeFromStringList(StringList * pList_io, const char * pszString_in);
removeFromStringList
removes the first occcurrence of a string
from a list of strings.
The arguments to removeFromStringList
are as follows:
pList_io
pszString_in
a pointer to the (possibly shorter) list, or NULL
if the only
item in the list was removed
5.71.4 Example
#include "strlist.h" ... static StringList * pNameList_m; ... char * pszName; ... pNameList_m = removeFromStringList(pNameList_m, pszName); ...
`rmstr_sl.c'
#include "rpterror.h" /* or opaclib.h */ void reportError(int eMessageType_in, const char * pszFormat_in, ...);
reportError
reports an error message to the user. For MS-DOS
and Unix, reportError
writes to the standard error output. The
message is also written to the standard output if it has been
redirected. For GUI programs, the programmer must write a different
version of reportError
to satisfy the link requirements of other
functions in the OPAC library. This would typically display a message
box.
The arguments to reportError
are as follows:
eMessageType_in
ERROR_MSG
WARNING_MSG
DEBUG_MSG
pszFormat_in
printf
style format string for the (error) message.
...
pszFormat_in
).
none
5.72.4 Example See section 5.1 addDataToTrie.
5.72.5 Source File `rpterror.c'
#include "rpterror.h" /* or opaclib.h */ void reportMessage(int bNotSilent_in, const char * pszFormat_in, ...);
reportMessage
displays a message with zero or more arguments.
For MS-DOS and Unix, reportMessage
writes to the standard error
output. The message is also written to the standard output if it has
been redirected. For GUI programs, the programmer must write a different
version of reportMessage
to satisfy the link requirements of other
functions in the OPAC library. This would typically write to a message
window.
The arguments to reportMessage
are as follows:
bNotSilent_in
TRUE
(nonzero). If FALSE
(zero), the message is written only to the
standard output (stdout
), and then only if it has been
redirected. This allows programs to have a "quiet" mode of
operation without requiring a global variable.
pszFormat_in
printf
style format string for the message.
...
pszFormat_in
).
none
5.73.4 Example
#include "rpterror.h" ... static int iDebugLevel_m; ... static int read_token(pszBuffer_in, uiBufferSize_in) char * pszBuffer_in; unsigned uiBufferSize_in; { int iTokenType; ... if (iDebugLevel_m >= 8) { reportMessage("DEBUG read_token(\"%s\",%u) => ", pszBuffer_in, uiBufferSize_in); switch (iTokenType) { case BECOMES: reportMessage("BECOMES_TOKEN"); break; case KEYWORD: reportMessage("KEYWORD_TOKEN"); break; case SYMBOL: reportMessage("SYMBOL_TOKEN"); break; default: reportMessage("'%c'\t", iTokenType); break; } reportMessage("\n"); } return( iTokenType ); }
`rptmessg.c'
#include "opaclib.h" void reportProgress(unsigned long uiCount_in);
reportProgress
displays a progress report based on a progress
counter.
The standard version of reportProgress
actually does nothing.
For GUI programs, the programmer may write a version of
reportProgress
to display some sort of progress message using
the progress counter.
reportProgress
has one argument:
uiCount_in
none
5.74.4 Example
#include "opaclib.h" ... static unsigned long uiTokenCount_m; ... static int read_token(pszBuffer_in, uiBufferSize_in) char * pszBuffer_in; unsigned uiBufferSize_in; { int iTokenType; ... ++uiTokenCount_m; reportProgress( uiTokenCount_m ); return( iTokenType ); }
`rptprgrs.c'
#include "textctl.h" /* or template.h or opaclib.h */ void resetTextControl(TextControl * pTextCtl_io);
resetTextControl
frees any memory allocated by either
loadIntxCtlFile
or
loadOutxCtlFile
. It does not free the
TextControl
data structure itself.
resetTextControl
has one argument:
pTextCtl_io
none
5.75.4 Example
#include <stdio.h> #include "textctl.h" /* include strclass.h */ #include "rpterror.h" ... char szIntxFilename_g[200]; TextControl sTextControl_g; StringClass * pStringClasses_g = NULL; static TextControl sDefaultTextControl_m = { NULL, /* filename */ NULL, /* ordered array of lowercase letters */ NULL, /* ordered array of matching uppercase letters */ NULL, /* array of caseless letters */ NULL, /* list of input orthography changes */ NULL, /* list of output (orthography) changes */ NULL, /* list of format markers (fields) to include */ NULL, /* list of format markers (fields) to exclude */ '\\', /* initial character of format markers (field codes) */ '%', /* character for marking ambiguities and failures */ '-', /* character for marking decomposition */ '|', /* initial character of secondary format markers */ NULL, /* (Manuscripter) bar codes */ TRUE, /* flag whether to capitalize individual letters */ TRUE, /* flag whether to decapitalize/recapitalize */ 100 /* maximum number of decapitalization alternatives */ }; ... memcpy(&sTextControl_g, &sDefaultTextControl_m, sizeof(TextControl)); fprintf(stderr, "Text Control File (xxINTX.CTL) [none]: "); fgets( szIntxFilename_g, 200, stdin ); if (szIntxFilename_g[0]) { if (loadIntxCtlFile(szIntxFilename_g, ';', sTextControl_g, pStringClasses_g) != 0) { reportError(ERROR_MSG, "Error reading text control file %s\n", szIntxFilename_g); } } if ( (sTextControl_g.cBarMark == NUL) && (sTextControl_g.pszBarCodes != NULL) ) { freeMemory(sTextControl_g.pszBarCodes); sTextControl_g.pszBarCodes = NULL; } if ( (sTextControl_g.cBarMark != NUL) && (sTextControl_g.pszBarCodes == NULL) ) { sTextControl_g.pszBarCodes = (unsigned char *)duplicateString( "bdefhijmrsuvyz"); } ... resetTextControl(&sTextControl_g);
`resetxtc.c'
#include "textctl.h" /* or template.h or opaclib.h */ void resetWordFormationChars(TextControl * pTextCtl_io);
resetWordFormationChars
erases the stored information about word
formation characters stored by previous calls to either
addWordFormationChars
or addLowerUpperWFChars
.
This frees any allocated memory and sets the relevant pointers to
NULL
.
resetWordFormationChars
has one argument:
pTextCtl_io
none
5.76.4 Example See section 5.2 addLowerUpperWFChars.
5.76.5 Source File `myctype.c'
#include "allocmem.h" /* or opaclib.h */ void setAllocMemoryTracing(const char * pszFilename_in);
setAllocMemoryTracing
turns debugging on (if a filename is
given) or off (if pszFilename_in
is NULL
). If debugging
is on, every call to allocMemory
, reallocMemory
, and
freeMemory
is logged to the given file for postmortem analysis.
Calls to duplicateString
are logged as calls to
allocMemory
, which duplicateString
calls internally.
setAllocMemoryTracing
has one argument:
pszFilename_in
NULL
.
none
5.77.4 Example
#include <stdlib.h> #include "allocmem.h" ... extern int getopt(int argc, char * const argv[], const char *opts); extern char * optarg; ... int main(int argc, char ** argv) { void * pTrapAddress = NULL; unsigned iTrapCount = 0; int k; char * p; ... while ((k = getopt(argc, argv, "ai:o:x:z:Z:")) != EOF) { switch (k) { ... case 'z': /* memory allocation trace filename */ setAllocMemoryTracing(optarg); break; case 'Z': /* memory allocation trap address,count */ pTrapAddress = (void *)strtoul(optarg, &p, 10); if (*p == ',') iTrapCount = (unsigned)strtoul(p+1, NULL, 10); if (iTrapCount == 0) iTrapCount = 1; setAllocMemoryTrap(pTrapAddress, iTrapCount); break; ... } } ... }
`allocmem.c'
#include "allocmem.h" /* or opaclib.h */ void setAllocMemoryTrap(const void * pAddress_in, int iCount_in);
setAllocMemoryTrap
sets a trap for the iCount_in
'th
reference to the address pAddress_in
by either
allocMemory
or freeMemory
. This can be useful for
tracking down memory allocation bugs.
The arguments to setAllocMemoryTrap
are as follows:
pAddress_in
iCount_in
none
5.78.4 Example See section 5.77 setAllocMemoryTracing.
5.78.5 Source File `allocmem.c'
#include "opaclib.h" unsigned long showAmbiguousProgress(unsigned uiAmbiguityCount_in, unsigned long uiItemCount_in);
showAmbiguousProgress
displays the progress of the program in a
rudimentary fashion. If uiAmbiguityCount_in
is 0, then a star
(`*') is written to the screen, and if uiAmbiguityCount_in
is 1, then a dot (`.') is written to the screen. Otherwise, if
uiAmbiguityCount_in
is less than 10, the count digit is written,
and if it is greater than or equal to 10, a greater than sign
(`>') is written. These progress characters are grouped in
bunches of 10, with 5 bunches on a line and space between each bunch.
Every other line ends with the total count of items thus far
(uiItemCount_in
).
The arguments to showAmbiguousProgress
are as follows:
uiAmbiguityCount_in
uiItemCount_in
the updated value for uiItemCount_in
5.79.4 Example See section 5.38 freeWordTemplate.
5.79.5 Source File `ambprog.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ StringList * squeezeStringList(StringList * pList_io);
squeezeStringList
removes any redundant strings from a list of
strings.
squeezeStringList
has one argument:
pList_io
a pointer to the (possibly smaller) list of strings
5.80.4 Example
#include "template.h" /* includes strlist.h */ ... static WordTemplate * pTemplate_m = NULL; ... /* * eliminate identical results */ pTemplate_m->pNewWords = squeezeStringList( pTemplate_m->pNewWords );
`sqz_sl.c'
#include "opaclib.h" unsigned char * tokenizeString(unsigned char * pszString_in, const unsigned char * pszSeparate_in)
tokenizeString
splits the string (pszString_in
into a
sequence of zero or more text tokens separated by spans of one or more
characters from pszSeparate_in
. Only the initial call provides
a value for pszString_in
; successive calls must use a
NULL
pointer for the first argument. The first separater
character following the token in pszString_in
is replaced by a
NUL
character. Subsequent calls to tokenizeString
work
through pszString_in
sequentially. Note that
pszSeparate_in
may change from one call to the next.
tokenizeString
is like strtok
except that it
operates on strings of unsigned char
rather than strings of
char
.
The arguments to tokenizeString
are as follows:
pszString_in
NUL
-terminated character string, or NULL
.
pszSeparate_in
NUL
-terminated set of separator characters, or
NULL
. If it is NULL
, then the rest of the string is
returned as the token.
a pointer to the next token extracted from the input string, or
NULL
if no more tokens exist
5.81.4 Example
#include "opaclib.h" ... char szWhitespace_m[7] = " \n\r\t\f\v"; char szInputBuffer_m[1024]; char * pszToken; ... for ( pszToken = tokenizeString(szInputBuffer_m, szWhitespace_m) ; pszToken != NULL ; pszToken = tokenizeString(NULL, szWhitespace_m) ) { ... } ...
`tokenize.c'
#include "opaclib.h" char * trimTrailingWhitespace(char * pszString_io);
trimTrailingWhitespace
removes any trailing white space
characters from the input string.
trimTrailingWhitespace
has one argument:
pszString_io
a pointer to the beginning of the input string
5.82.4 Example
#include "opaclib.h" ... static char szWhitespace_m[7] = " \t\r\n\f\v"; ... FILE * pRulesFP; unsigned uiLineNumber; char * pszToken; ... for ( uiLineNumber = 1 ;;) { pszToken = readLineFromFile(pRulesFP, &uiLineNumber, ';'); if (pszToken == NULL) break; /* * skip leading spaces and remove trailing spaces */ pszToken += strspn(pszToken, szWhitespace_m); if (*pszToken == NUL) continue; trimTrailingWhitespace(pszToken); ... }
`trimspac.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ void unlinkStringList(StringList ** ppList_io);
unlinkStringList
frees the StringList
data structures in
a list of strings, while leaving intact the strings they point to.
The arguments to unlinkStringList
are as follows:
ppList_io
none
5.83.4 Example
#include "strlist.h" ... StringList * pList; ... unlinkStringList(pList); pList = NULL;
`unlst_sl.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ char * updateStringList(StringList ** ppList_io, const char * pszString_in);
updateStringList
adds the string to the list if it is not
already in the list. This function is similar to
mergeIntoStringList
, except that it has a different argument and
returns a different value.
The arguments to updateStringList
are as follows:
ppList_io
pszString_in
a pointer to the copy of pszString_in
stored in the list of
strings
5.84.4 Example
#include "strlist.h" ... static StringList * pCategories_m; static char szBuffer_m[100]; ... char * pszCategory; ... pszCategory = updateStringList( &pCategories_m, szBuffer_m ); ...
`updat_sl.c'
#include "trie.h" /* or opaclib.h */ void walkTrie(Trie * pTrieHead_in, void (* pfWalk_in)(void * pList_in));
walkTrie
walks through a trie, processing the information stored
at each node.
The arguments to walkTrie
are as follows:
pTrieHead_in
pfWalk_in
pList_in
Trie
node
(Trieinfo
).
none
5.85.4 Example
#include <stdio.h> #include "trie.h" #include "rpterror.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next item */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; FILE * pLexiconFP_m; ... static void write_lex_items(void * pList_in) { LexItem * pLex; if (pLexiconFP_m == NULL) return; for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink ) { fprintf(pLexiconFP_m, "%-20s %-20s %s\n", pLex->pszForm, pLex->pszGloss, get_lexical_category_name(pLex->uiCategory)); } } void write_lexicon() { if (pszLexiconFile_in == NULL) { reportError(WARNING_MSG, "Missing output lexicon filename\n"); return; } pLexiconFP_m = fopen(pszLexiconFile_in, "w"); if (pLexiconFP_m == NULL) { reportError(WARNING_MSG, "Cannot open lexicon file %s for output\n", pszLexiconFile_in); return; } walkTrie(pLexicon_g, write_lex_items); fclose(pLexiconFP_m); }
`trie.c'
#include "allocmem.h" /* or opaclib.h */ void writeAllocMemoryDebugMsg(const char * pszFormat_in, ...);
writeAllocMemoryDebugMsg
writes a message to the memory
allocation tracing file if it is open, and does nothing if that file is
not open. The memory allocation tracing file is opened and closed by
setAllocMemoryTracing
. writeAllocMemoryDebugMsg
is
similar to printf
except that it writes to a specific (optional)
file rather than to the standard output.
The arguments to writeAllocMemoryDebugMsg
are as follows:
pszFormat_in
printf
style format string for the message.
...
pszFormat_in
).
none
5.86.4 Example
#include "allocmem.h" #include "strlist.h" ... StringList * pStrings; ... writeAllocMemoryDebugMsg("deleting %u strings\n", getStringListSize(pStrings)); freeStringList(pStrings); pStrings = NULL;
`allocmem.c'
#include "change.h" void writeChange(const Change * pChange_in, FILE * pOutputFP_in);
writeChange
writes the given Change
data structure to the
output file as a human readable string consisting of a pair of quoted
strings followed by the environment constraint (if any).
The arguments to writeChange
are as follows:
pChange_in
pNext
field of the Change
data structure is ignored.)
pOutputFP_in
none
5.87.4 Example
#include <stdio.h> #include "change.h" ... void writeChangeList(FILE * pOutputFP_in, Change * pChanges_in) { Change * cp; if (pOutputFP_in == NULL) return; for ( cp = pChanges_in ; cp ; cp = cp->pNext ) writeChange(cp, pOutputFP_in); }
`change.c'
5.88.1 Syntax
#include "record.h" void writeCodeTable(FILE * pOutputFP_in, const CodeTable * pTable_in);
writeCodeTable
writes the contents of a CodeTable
data
structure to a file. The output is useful only for debugging.
The arguments to writeCodeTable
are as follows:
pOutputFP_in
pTable_in
CodeTable
data structure.
none
5.88.4 Example
#include "record.h" #include "ample.h" AmpleData sAmpleData_g; char szCodesFilename_g[100]; ... loadAmpleDictCodeTables(szCodesFilename_g, &sAmpleData_g, FALSE); writeCodeTable( sAmpleData_g.pLogFP, sAmpleData_g.pPrefixTable );
`loadtb.c'
#include "strclass.h" /* or change.h or textctl.h or template.h or opaclib.h */ void writeStringClasses(FILE * pOutputFP_in, const StringClass * pClasses_in);
writeStringClasses
writes the contents of all the string classes
in the list to a file.
The arguments to writeStringClasses
are as follows:
pOutputFP_in
pClasses_in
none
5.89.4 Example
#include <stdio.h> #include "strclass.h" ... static StringClass * pClasses_m; ... writeStringClasses(stdout, pClasses_m); ... }
`strcla.c'
#include "strlist.h" /* or strclass.h or change.h or textctl.h or template.h or opaclib.h */ void writeStringList(const StringList * pList_in, const char * pszSep_in, FILE * pOutputFP_in);
writeStringList
writes a list of strings to an output file,
separating the individual strings in the list by the indicated string.
The arguments to writeStringList
are as follows:
pList_in
pszSep_in
pOutputFP_in
none
5.90.4 Example
#include <stdio.h> #include "strlist.h" ... static StringList * pCategories_m; ... void showCategories() { printf("Categories: "); writeStringList(pCategories_m, " ", stdout); printf("\n"); }
`write_sl.c'
#include "template.h" /* or opaclib.h */ void writeTemplate(FILE * pOutputFP_in, const char * pszFilename_in, const WordTemplate * pTemplate_in, const TextControl * pTextCtl_in);
writeTemplate
writes the results of a morphological analysis as
a database. Each word is a record with these fields:
\a
\d
\cat
\p
\fd
\u
\w
\f
\c
\n
Ambiguities are marked as %n%Anal1%Anal2%...%analn%
.
Failures are marked as %0%OriginalWord%
or %0%%
.
(The separation character can be set to something other than %
.)
The arguments to writeTemplate
are as follows:
pOutputFP_in
pszFilename_in
pTemplate_in
pTextCtl_in
none
5.91.4 Example See section 5.38 freeWordTemplate.
5.91.5 Source File `dtbout.c'
#include "template.h" /* or opaclib.h */ void writeTextFromTemplate(FILE * pOutputFP_in, const WordTemplate * pTemplate_in, const TextControl * pTextCtl_in);
writeTextFromTemplate
writes the results of a morphological
synthesis to an output file, restoring all the formatting information
associated with the word in the original input to analysis.
Ambiguities are marked as %n%Word1%Word2%...%Wordn%
.
Failures are marked as %0%OriginalWord%
.
(The separation character can be set to something other than %
.)
The arguments to writeTextFromTemplate
are as follows:
pOutputFP_in
pTemplate_in
pTextCtl_in
none
5.92.4 Example See section 5.65 readTemplateFromAnalysis.
5.92.5 Source File `textout.c'
#include "trie.h" /* or opaclib.h */ void writeTrieData(Trie * pTrieHead_in, void (* pfWriteInfo_in)(void * pList_in, int iIndent_in, FILE * pOutputFP_in), FILE * pOutputFP_in);
writeTrieData
walks through a trie, writing the information
stored at each node to a file. This is intended primarily for
debugging, as the trie structure is explicitly written to the output
file in indented form, together with the information stored in the
trie.
The arguments to writeTrieData
are as follows:
pTrieHead_in
pfShowInfo_in
pList_in
Trie
node
(Trieinfo
).
iIndent_in
pOutputFP_in
pOutputFP_in
none
5.93.4 Example
#include <stdio.h> #include "trie.h" #include "rpterror.h" ... typedef struct lex_item { struct lex_item * pLink; /* link to next item */ struct lex_item * pNext; /* link to next homograph */ unsigned char * pszForm; /* lexical form (word) */ unsigned char * pszGloss; /* lexical gloss */ unsigned short uiCategory; /* lexical category */ } LexItem; ... Trie * pLexicon_g; ... static void debug_lex_items(void * pList_in, int iIndent_in, FILE * pOutputFP_in) { LexItem * pLex; int i; if (pOutputFP_in == NULL) return; for ( pLex = (LexItem *)pList_in ; pLex ; pLex = pLex->pLink ) { for ( i = 0 ; i < iIndent_in ; ++i ) fputc(' ', pOutputFP_in); fprintf(pOutputFP_in, "%-20s %-20s %u [%lu -> %lu]\n", pLex->pszForm, pLex->pszGloss, pLex->uiCategory, (unsigned long)pLex, (unsigned long)pLex->pNext); } } void debug_lexicon() { printf("BEGIN LEXICON TRIE DATA\n"); writeTrieData(pLexicon_g, debug_lex_items, stdout); printf("END LEXICON TRIE DATA\n"); }
`trie.c'
#include "template.h" void writeWordAnalysisList(const WordAnalysis * pAnalyses_in, FILE * pOutputFP_in);
writeWordAnalysisList
writes a list of WordAnalysis
data
structures to an output file for debugging purposes.
The arguments to writeWordAnalysisList
are as follows:
pAnalyses_in
WordAnalysis
data structures.
pOutputFP_in
none
5.94.4 Example
#include <stdio.h> #include "template.h" ... void dumpWordTemplate(pTemplate_in, pOutputFP_in) WordTemplate * pTemplate_in; FILE * pOutputFP_in; { if (pOutputFP_in == NULL) return; if (pTemplate_in == NULL)) { fprintf(pOutputFP_in, "WordTemplate ptr is NULL\n"); return; } putc('\n', pOutputFP_in); fprintf(pOutputFP_in, " orig_word = \"%s\"\n", pTemplate_in->pszOrigWord ? pTemplate_in->pszOrigWord : "{NULL}" ); fprintf(pOutputFP_in, " word = \"%s\"\n", pTemplate_in->paWord && pTemplate_in->paWord[0] ? pTemplate_in->paWord[0] : "{NULL}" ); fprintf(pOutputFP_in, " format = \"%s\"\n", pTemplate_in->pszFormat ? pTemplate_in->pszFormat : "{NULL}" ); fprintf(pOutputFP_in, " non_alpha = \"%s\"\n", pTemplate_in->pszNonAlpha ? pTemplate_in->pszNonAlpha : "{NULL}" ); fprintf(pOutputFP_in, " capital = %d\n", pTemplate_in->iCapital ); writeWordAnalysisList(pTemplate_in->pAnalyses, pOutputFP_in); fprintf(pOutputFP_in, " new_words = "); if (pTemplate_in->pNewWords) { fprintf(pOutputFP_in, "\""); writeStringList( pTemplate_in->pNewWords, "\" \"", pOutputFP_in); fprintf(pOutputFP_in, "\"\n"); } else fprintf(pOutputFP_in, "{NULL}\n"); }
`wordanal.c'
#include "textctl.h" /* or template.h or opaclib.h */ void writeWordFormationChars(FILE * pOutputFP_in, const TextControl * pTextCtl_in);
writeWordFormationChars
writes the set of word formation
characters to an output file. This function depends on previous calls
to addWordFormationChars
and addLowerUpperWFChars
.
The arguments to writeWordFormationChars
are as follows:
pOutputFP_in
pTextCtl_in
none
5.95.4 Example
#include <stdio.h> #include "textctl.h" ... static TextControl sTextCtl_m; ... printf("The word formation characters are:\n"); writeWordFormationChars(stdout, &sTextCtl_m); ...
`myctype.c'
This document was generated on 20 March 2003 using texi2html 1.56k.