Parse Library: 1.2 Parse Library Behavior: The Parser

Up: GEOS SDK TechDocs | Up | Prev: 1.1 The Scanner | Next: 1.3 Evaluator

Applications will never call the scanner directly. Instead, if they access the parse library directly (instead of through the spreadsheet objects), they will call the parser and pass it a string, and the parser will in turn call the scanner to process the string into tokens. This section will not discuss how to call the parser, since few applications will need to do that; it will instead describe the general workings of the parser.

The parser translates a well-formed string into a sequence of tokens. It calls the scanner to read tokens from the string. It then uses a context-free grammar to make sure the string is well formed. The context-free grammar is described below. The scanner outputs a sequence of parser tokens. The parser tokens are almost identical to the scanner tokens, with a few exceptions; those exceptions are noted below.

The parser is passed a callback routine. The parser calls this routine when it needs information about a token; for example, if it encounters a function it does not recognize, it calls the callback to get a pointer to the function. The details of this are provided in the advanced section.

If the parser is not passed a well-formed expression, or if it is unable to successfully parse the string for some other reason, it returns an error code. The error codes are described at length in the advanced section.

The Parser's Grammar

The parser uses a context-free grammar to make sure the string is well-formed. The grammar is listed below. The basic units of the grammar are listed in ALL-CAPS; higher-level units are listed in italics . The string must parse to a well-formed expression .

expression :
'(' expression ') '
NEG_OP expression
IDENTIFIER '(' function_args ')'
base_item more_expression
more_expression :
<empty>
PERCENT_OP more_expression
BINARY_OP expression
function_args :
<empty>
arg_list
arg_list :
expression
expression
',' arg_list
base_item : NUMBER
STRING
CELL_REF
IDENTIFIER

Parser Tokens

The parser does not return scanner tokens; instead, it returns a sequence of parser tokens. The parser tokens are almost directly analogous to the scanner tokens. However, a few additional token types are added.

The parser tokens have the same structure as the scanner tokens. The first field is a constant specifying what type of token this is. The second field contains specific information about the token; this field may be blank. The parser has the following types of tokens:

PARSER_TOKEN_NUMBER
This is the same as the scanner NUMBER token.
PARSER_TOKEN_STRING
This is the same as the scanner STRING token.
PARSER_TOKEN_CELL
This is the same as the scanner CELL token.
PARSER_TOKEN_END_OF_EXPRESSION
This is the same as the scanner END_OF_EXPRESSION token.
PARSER_TOKEN_OPEN_PAREN
This usually replaces the scanner OPEN_PAREN token. However, it is not used if the parenthesis is delimiting function arguments; it is only used if the parenthesis is changing the order of evaluation.
PARSER_TOKEN_CLOSE_PAREN
This usually replaces the scanner CLOSE_PAREN token. However, it is not used if the parenthesis is delimiting function arguments; it is only used if the parenthesis is changing the order of evaluation.
PARSER_TOKEN_NAME
This replaces some occurrences of the scanner IDENTIFIER token; specifically, those where the identifier is not a function name. The data portion is the number for that name.
PARSER_TOKEN_FUNCTION
This replaces some occurrences of the scanner IDENTIFIER token, specifically those in which the identifier is a function name. The data portion is the function ID number.
PARSER_TOKEN_CLOSE_FUNCTION
This replaces some occurrences of the scanner CLOSE_PAREN token; specifically, those where the closing parenthesis delimits function arguments.
PARSER_TOKEN_ARG_END
The parser inserts this token after every argument to a function call; thus, it replaces occurrences of SCANNER_TOKEN_LIST_SEPARATOR, and also occurs after the last argument to a function.
PARSER_TOKEN_OPERATOR
This is the same as the parser's OPERATOR token. The data section is an operator constant, as described above in Operators . Note the parser replaces occurrences of OP_PERCENT_MODULO with either OP_PERCENT or OP_MODULO, as appropriate; similarly, it replaces OP_SUBTRACTION_NEGATION with either OP_SUBTRACTION or OP_NEGATION.

When the parser encounters an identifier that is in the appropriate place for a function name (that is, an identifier followed by a parenthesized argument list), it does not write an identifier token. Instead, it writes a "function" token, which has a one-word data section. This section is the function ID (described in Parser Functions ). If the function's name is not one of a built-in function, it will call the application's callback routine to find out what the function's ID number is; the evaluator will pass this ID when it needs to have the function called.

When the parser encounters an identifier, it asks its caller for an ID number for the identifier. It can then store the ID number instead of the entire string. The evaluator will use this ID number when requesting the value of the identifier. The formatter will use the ID number when requesting the original identifier string associated with the ID number.

When the parser encounters a scanner parenthesis token, it does not necessarily translate it into a parser parenthesis token. This is because parentheses fulfill two separate roles: they specify the order of evaluation, and they delimit function arguments. When the parser encounters parenthesis tokens which specify order of evaluation, it translates them into parser parenthesis tokens. If, however, it encounters argument-delimiting parentheses, it does not need to translate them literally; after all, the presence of a function token implies that it will be followed by an argument list. Thus, the parser does not need to copy the parenthesis tokens. Instead, it copies the tokens of the argument list. When it reaches a list separator, it replaces that with an "end-of-argument" token; when it reaches the closing parenthesis for the function call, it replaces that with a "close-function" token.

An Example of Scanning and Parsing

Suppose that you call the parser on the text string "3 + SUM(6.5, 3 ^ (4 - 1), C5...F9)". The parser will evaluate the string, one token at a time. When it needs to process a token, it will call the scanner to return the next token in the string. It will then replace these tokens with parser tokens, and write out the sequence of tokens to its output buffer.

For simplicity, this example treats the scanner as if it scanned the entire text stream at once, and returned the entire sequence of tokens to the scanner. In this case, the scanner would translate the text into the following sequence of tokens:

Token		Data			Comment
NUMBER		3.0			All numbers are floats
OPERATOR		OP_ADDITION
IDENTIFIER		"SUM"
OPEN_PAREN					delimits function args
NUMBER		6.5
LIST_SEPARATOR
NUMBER		3.0
OPERATOR		OP_EXPONENTIATION
OPEN_PAREN
NUMBER		4.0
OPERATOR		OP_SUBTRACTION_NEGATION
					Parser figures out
					which operator this is
NUMBER		1.0
CLOSE_PAREN
LIST_SEPARATOR
CELL		C5			Actually stored as
		 			"4,2"; row index 4,
					column index 2
OPERATOR		OP_RANGE_SEPARATOR
CELL		F9
CLOSE_PAREN
END_OF_EXPRESSION

The parser reads these tokens, one at a time, and writes out an analogous sequence of parser tokens:

Token		Data			Comment
NUMBER		3.0			All numbers are floats
OPERATOR		OP_ADDITION
FUNCTION		FUNCTION_ID_SUM
NUMBER		6.5
END_OF_ARG
NUMBER		3.0
OPERATOR		OP_EXPONENTIATION
OPEN_PAREN
NUMBER		4.0
OPERATOR		OP_SUBTRACTION
NUMBER		1.0
CLOSE_PAREN
END_OF_ARG
CELL		C5			Actually stored as
		 			"4,2"; row index 4,
					column index 2
OPERATOR		OP_RANGE_SEPARATOR
CELL		F9
END_OF_ARG
CLOSE_FUNCTION
END_OF_EXPRESSION

The application does not need to save the original text string. Instead, it can save the buffer containing the parser tokens, and use the formatter to translate the token sequence back into a character string.


Up: GEOS SDK TechDocs | Up | Prev: 1.1 The Scanner | Next: 1.3 Evaluator