It was quite fun to write a BASIC interpreter in modern C++ and BASIC itself is a simple language which recalls memories from the early days of personal computing, where each computer - such as the glorious Commodore 64 - had one of those embedded inside. Main interpreter componentsTo write nuBASIC interpreter I followed the approach which mainly consists in parsing the BASIC source into a parse tree and then execute it, so I wrote the following main components, however they often do not have one to one correspondence with a single class or other C++ element:
A line-oriented language interpreterBASIC language is line-oriented and this produces one of the key differences between BASIC and other programming languages where the position of hard line breaks in the source code is irrelevant.Each code line in a BASIC program forms a self-contained unit. For such reason the nuBASIC interpreter is itself line-oriented: program source text is split into lines which are owned from Interpreter class (source lines are stored in a map of pairs <line-number, text-line>). Each code line is parsed into self-contained execution unit. The interpreter builds a static program context which represents the glue code among program lines (and statements). Indeed, the Statement Parser recognizes complex language constructs although are split in different lines, and builds meta-data which refers them. Each control structures line can also contain more than one statement. Handling the tokensThe parsing of each line is preceded by the separate lexical analysis provided by the Tokenizer, which creates tokens from the source text. Token list containerTo reduce parser complexity, a Token List container class, wrapped around a standard Deque, has been provided.Token list class adds some facility thought to make simple handling token lists and reduce parser implementation complexity.
Parsing the codeWhile an unique Tokenizer exists, more than one Parser has been implemented:
/
\
binary_expression The abstract class expr_any_t defines the virtual method eval(), which sub-classes implement. The prototype of eval() method is the following:
VariantVariant class is provided to manipulate several distinct BASIC language types in a uniform manner. This reduces drastically the evaluator complexity. I did not use C++ union because it supports only plain old data types and it is not adapt for non-trivial complex types. Tracing execution of a simple programLet us consider the following simple BASIC program, containing just a unique line with a unique statement:
Suppose you have already inserted the program so you have just type “RUN” to execute it. First nuBASIC_console() function gets the command string “RUN” from standard input, then invokes the function exec_command() which is a helper function that invokes the related interpreter exec_command() method, catching any exceptions. This method parses a command in order to recognize it and perform the action required. In this case it calls the rebuild() interpreter method which clears static and run-time context objects (removing both dynamic data and meta-data), then for each source line calls the update_program() method. This method creates a Tokenizer object and calls the compile_line() method of the Statement Parser object, held as attribute of the Interpreter object. compile_line() method uses the Tokenizer to break down the source line into a language token list like the following (for simplicity no all token fields are reported below):
Each line of code is treated as a “block” which is a container of statements. Thus the method parse_block() is first called. This method iterates while the token list is not empty calling for each iteration the method parse_stmt(), which is able to recognize the statement in order to select the specific parse_xxx() method(). In our example, it recognizes the token “PRINT” (which is the first token of the token list), and calls the specific parse_print() method and this builds a stmt_print_t object which holds an expression object instance. parse_print() method calls the template function parse_arg_list() which builds an expression list for the PRINT statement. Each item of the list is an expression object built via the Expression Parser, ready to be evaluate by its eval() method against a run-time program (execution) context. Finally, parse_print() method returns to the calling parse_block(). Which in turn returns an handle to statement block object by means of a (smart standard) shared pointer to the class object. The statement handle is stored in a map (prog_line_t) where the key element is just the processing line number. After building the program, interpreter calls the run() method which creates a program_t object instance passing to it program line and program context objects coming from previous building phase. The program object executes each code line (represented by block statement object as discussed before) by calling the related run() virtual method. In our example the unique program line (a block statement object) contains the print statement object. Calling its run() method the related print statement run() method is finally invoked.
stmt_print_t run() method evaluates each argument of its argument list. The argument list is just a collection of expression objects which export the eval() method. The eval() method returns a variant object that can be printed out to the standard output. |
Making the interpreter
Subpages (1):
How to extend the built-in function set