GEOS SDK TechDocs
|
|
1.1 The Scanner
|
1.3 Evaluator
Applications will never call the scanner directly. Instead, if they access the parse library directly (instead of through the spreadsheet objects), they will call the parser and pass it a string, and the parser will in turn call the scanner to process the string into tokens. This section will not discuss how to call the parser, since few applications will need to do that; it will instead describe the general workings of the parser.
The parser translates a well-formed string into a sequence of tokens. It calls the scanner to read tokens from the string. It then uses a context-free grammar to make sure the string is well formed. The context-free grammar is described below. The scanner outputs a sequence of parser tokens. The parser tokens are almost identical to the scanner tokens, with a few exceptions; those exceptions are noted below.
The parser is passed a callback routine. The parser calls this routine when it needs information about a token; for example, if it encounters a function it does not recognize, it calls the callback to get a pointer to the function. The details of this are provided in the advanced section.
If the parser is not passed a well-formed expression, or if it is unable to successfully parse the string for some other reason, it returns an error code. The error codes are described at length in the advanced section.
The parser uses a context-free grammar to make sure the string is well-formed. The grammar is listed below. The basic units of the grammar are listed in ALL-CAPS; higher-level units are listed in italics . The string must parse to a well-formed expression .
The parser does not return scanner tokens; instead, it returns a sequence of parser tokens. The parser tokens are almost directly analogous to the scanner tokens. However, a few additional token types are added.
The parser tokens have the same structure as the scanner tokens. The first field is a constant specifying what type of token this is. The second field contains specific information about the token; this field may be blank. The parser has the following types of tokens:
When the parser encounters an identifier that is in the appropriate place for a function name (that is, an identifier followed by a parenthesized argument list), it does not write an identifier token. Instead, it writes a "function" token, which has a one-word data section. This section is the function ID (described in Parser Functions ). If the function's name is not one of a built-in function, it will call the application's callback routine to find out what the function's ID number is; the evaluator will pass this ID when it needs to have the function called.
When the parser encounters an identifier, it asks its caller for an ID number for the identifier. It can then store the ID number instead of the entire string. The evaluator will use this ID number when requesting the value of the identifier. The formatter will use the ID number when requesting the original identifier string associated with the ID number.
When the parser encounters a scanner parenthesis token, it does not necessarily translate it into a parser parenthesis token. This is because parentheses fulfill two separate roles: they specify the order of evaluation, and they delimit function arguments. When the parser encounters parenthesis tokens which specify order of evaluation, it translates them into parser parenthesis tokens. If, however, it encounters argument-delimiting parentheses, it does not need to translate them literally; after all, the presence of a function token implies that it will be followed by an argument list. Thus, the parser does not need to copy the parenthesis tokens. Instead, it copies the tokens of the argument list. When it reaches a list separator, it replaces that with an "end-of-argument" token; when it reaches the closing parenthesis for the function call, it replaces that with a "close-function" token.
Suppose that you call the parser on the text string "3 + SUM(6.5, 3 ^ (4 - 1), C5...F9)". The parser will evaluate the string, one token at a time. When it needs to process a token, it will call the scanner to return the next token in the string. It will then replace these tokens with parser tokens, and write out the sequence of tokens to its output buffer.
For simplicity, this example treats the scanner as if it scanned the entire text stream at once, and returned the entire sequence of tokens to the scanner. In this case, the scanner would translate the text into the following sequence of tokens:
Token Data Comment
NUMBER 3.0 All numbers are floats
OPERATOR OP_ADDITION
IDENTIFIER "SUM"
OPEN_PAREN delimits function args
NUMBER 6.5
LIST_SEPARATOR
NUMBER 3.0
OPERATOR OP_EXPONENTIATION
OPEN_PAREN
NUMBER 4.0
OPERATOR OP_SUBTRACTION_NEGATION Parser figures out which operator this is
NUMBER 1.0
CLOSE_PAREN
LIST_SEPARATOR
CELL C5 Actually stored as "4,2"; row index 4, column index 2
OPERATOR OP_RANGE_SEPARATOR
CELL F9
CLOSE_PAREN
END_OF_EXPRESSION
The parser reads these tokens, one at a time, and writes out an analogous sequence of parser tokens:
Token Data Comment
NUMBER 3.0 All numbers are floats
OPERATOR OP_ADDITION
FUNCTION FUNCTION_ID_SUM
NUMBER 6.5
END_OF_ARG
NUMBER 3.0
OPERATOR OP_EXPONENTIATION
OPEN_PAREN
NUMBER 4.0
OPERATOR OP_SUBTRACTION
NUMBER 1.0
CLOSE_PAREN
END_OF_ARG
CELL C5 Actually stored as "4,2"; row index 4, column index 2
OPERATOR OP_RANGE_SEPARATOR
CELL F9
END_OF_ARG
CLOSE_FUNCTION
END_OF_EXPRESSION
The application does not need to save the original text string. Instead, it can save the buffer containing the parser tokens, and use the formatter to translate the token sequence back into a character string.
GEOS SDK TechDocs
|
|
1.1 The Scanner
|
1.3 Evaluator