agora-basic-compiler(1) agora-basic-compiler(1) NAME agora-basic-compiler - ANS Full BASIC Compiler SYNOPSIS agora-basic-compiler [ options ] source.bas DESCRIPTION Agora BASIC is a (batch) compiler for the BASIC programming language. Its dialect follows Full BASIC (see ANSI INCITS 113-1987), relaxing some of its limitations. At this time, only a small subset of the lan- guage is supported. The compiler has two phases. The first phase translates BASIC into C; the system C compiler is used as the second phase, to generate the object code and executables. The intermediate C code is not intended for human consumption. LANGUAGE The Agora BASIC language follows ANSI Full BASIC; any supported Full BASIC construct will work as described in the standard if used within the constraints specified by the standard, modulo bugs in the compiler. Lexical conventions An input file is treated as a sequence of lines. Lines are terminated by the line feed character. If a line starts with the & character, that line (sans the & character) is treated as a continuation of the preceding line. Lines that start with the exclamation mark are ignored. Every line (except for continuation lines) starts with a line number (an unsigned base-10 integer), and these line numbers must be in ascending order. Line numbers are used mostly as targets for old-style control-flow commands like GOTO, but they are also an integral part of the exception mechanism. An exclamation mark outside a string literal starts a comment that extends to the end of the line (including any continuation lines as part of the comment). An identifier starts with a letter or the underscore character, and extends as far to the right possible, consisting of letters, underscore characters, and numbers, and optionally ending with the dollar sign. Standard BASIC requires that identifiers be at most 31 characters long; Agora BASIC considers only the first 31 characters significant (if the identifier ends with the dollar sign, that character is counted as one of the 31 characters). Identifiers are case-insensitive. An identifier has the string type if it ends with the dollar sign, and the numeric type otherwise. Some identifiers are reserved for use as keywords; these are ELSE, NOT, PRINT and REM. Some identifiers cannot be used as variable names, as they denote intrinsic nullary functions; these are CON, DATE, DATE$, EXLINE, EXTYPE, IDN, MAXNUM, NUL$, PI, RND, TIME, TIME$, TRANSFORM and ZER. A string literal starts and ends with the double-quote character; how- ever, two double-quote characters in a row (excluding the starting dou- ble-quote character) are interpreted as a single literal double-quote charactero in the string itself, and do not terminate the string lit- eral. String literals cannot contain newlines, but A numeric literal has three parts: the integer part, the fraction part and the exponent part; all three are optional, with the proviso that either the integer part or the fraction part must be present. The integer part starts with an optional sign (+ or -) and continues as an unsigned base-10 integer. The fractional part starts with a full stop and continues as an unsigned base-10 integer. The exponent part starts with the letter e (in either case), followed by an optional sign and an unsigned base-10 integer. The literal is interpreted as a base-10 floating-point number in the usual way. Numeric expressions In Standard Full BASIC, and hence in Agora BASIC, numbers are repre- sented in the machine using the decimal (base-10) floating point repre- sentation, with 7-digit precision (ranging from -9999999e127 to +9999999e127) and with the machine epsilon being 1e-128. A simple numeric variable reference is Any non-reserved identifier hav- ing the numeric type, which has been declared earlier in the same pro- gram unit as a variable. It can be used as a numeric expression, and its value is the value currently bound to the identifier. A numeric array element reference begins with a non-reserved identifier having the numeric type, which has been declared earlier as an array with n dimensions. The identifier is then followed by a sequence of n comma-separated numeric expressions (called the indexes) in parenthe- ses. This construct can be used as an expression; to determine the value of this expression, the subexpressions are evaluated, and the values are rounded to the nearest integers (the halves upward); these integers must then be within the ranges declared for the respective dimensions. The value of the epression is then the value bound to the indicated element of the array. A numeric literal is a numeric expression, denoting its own value. If the value overflows, the exception 1001 is thrown. If the value under- flows, the recoverable exception 1501 is signalled, with the default recovery action being using zero instead of the value of the literal. A numeric function call is the name of a declared or intrinsic numeric function followed by comma-separated epressions or array references (the arguments) in parentheses. When no arguments are provided, the parentheses are optional. The number and type of arguments must match the declared parameters of the function. The call is a numeric expres- sion, and its value is obtained by first evaluating the arguments, and then invoking the function, binding the argument values to the function parameters; the value of the call expression is the value returned by the function. Numeric expressions can be formed by combining numeric expressions with an operator. The usual arithmetic operators are available: + for addi- tion, - for subtraction and negation, * for multiplication, / for divi- sion and ^ for exponentiation. Operators follow standard algebraic precedence rules; they are all left-associative. Division by zero throws the 3001 exception; overflow throws the 1002 exception, and underflow signals the recoverable 1502 exception (with the default recovery action of making the result zero). String expressions A simple sting variable reference is Any non-reserved identifier having the string type, which has been declared earlier in the same program unit as a variable. It can be used as a numeric expression, and its value is the value currently bound to the identifier. A string array element reference begins with a non-reserved identifier having the string type, which has been declared earlier as an array with n dimensions. The identifier is then followed by a sequence of n comma-separated numeric expressions (called the indexes) in parenthe- ses. This construct can be used as an expression; to determine the value of this expression, the subexpressions are evaluated, and the values are rounded to the nearest integers (the halves upward); these integers must then be within the ranges declared for the respective dimensions. The value of the epression is then the value bound to the indicated element of the array. Both a simple string variable reference and a string array element ref- erence may be suffixed with a substring reference: two numeric expres- sions separated with a colon, in parentheses. These two expressions are evaluated and they are rounded like array indexes to integers; these two integers, N and M, specify that the value of the expression is the substring of the named string (variable or array element) start- ing from the Nth and ending with the Mth character (inclusive). It is not an error for N or M to be out of range, but if N > M, the substring is empty. A string literal is a string expression, denoting its own value. A string function call is the name of a declared or intrinsic string function followed by comma-separated epressions or array references (the arguments) in parentheses. When no arguments are provided, the parentheses are optional. The number and type of arguments must match the declared parameters of the function. The call is a numeric expres- sion, and its value is obtained by first evaluating the arguments, and then invoking the function, binding the argument values to the function parameters; the value of the call expression is the value returned by the function. String expressions can be formed by combining string expressions with the & operator; it yields the concatenation of the two operand strings. Relational expressions The basic comparison operators are < (strictly less than), <= and => (less than or equal to), >= and => (greater than or equal to), > (strictly greater than), = (equal to), and <> and >< (not equal to). They require two oprerands of the same type (either numeric or string); the resulting expressions are relational. Relational expressions may be combined using the unary NOT operator and the binary AND and OR operators. These have the customary logical semantics (OR is inclusive, not exclusive). In terms of precedence, NOT binds the strongest, AND the second strongest and OR the weakest. Relational expressions can only used in special contexts; they don't have a first-class value, so their value cannot be stored in a variable or printed out. Commands CALL The command keyword is followed by what syntactically looks like a function call. The named "function", however, is resolved as a subroutine name; the subroutine is invoked by this command. DECLARE EXTERNAL SUB, DECLARE EXTERNAL FUNCTION The command phrase is followed by a comma-separated list of non- reserved identifiers. These identifiers are declared as exter- nal subroutine or external function names, depending on the com- mand. Declaring a name as an external subroutine makes that name available as a command name, with the syntax and semantics like the CALL command for the subroutine, sans the CALL keyword. DECLARE NUMERIC The command phrase is followed by a comma-separated list of non- reserved numeric identifiers. These identifiers are declared as numeric local variables. DIM Declares one or more array variables. The command takes as arguments a list of array specifications separated by commas, where each specification has the form "a(N1 TO M1, ..., Nn TO Mn)", where a is a currently undeclared numeric or string vari- able name and Ni and Mi are numeric expressions with each Ni < Mi, and where Ni is the smallest inde and Mi is the largest index for the ith dimension. A "Ni TO" may be missing, in which case Ni is taken as 1. Standard BASIC allows up to three dimen- sions; Agora BASIC imposes no limitations except what the under- lying C compiler can handle. EXTERNAL FUNCTION, EXTERNAL SUB Independent ("external" in Standard BASIC lingo) functions and subroutines are introduced by the EXTERNAL FUNCTION and EXTERNAL SUB commands. Both take the name of the function or the proce- dure, followed by a parameter list in parentheses. In the parameter list, parameters are separated by commas. Each param- eter declares a local variable in the funtion or subroutine, initialized at call time to the value of the corresponding parameters. Parameters may also be arrays; array parameters consist of the parameter name followed by zero or more commas in parentheses. The numer of dimensions of the array parameters is one more than the number of commas. Channel parameters are not supported at this time. In external functions, the function name itself is an uninitial- ized local variable. When the function exits, the value of that variable is used as the return value. External functions and subroutines share no state with the main program or other external functions and subroutines. They can only communicate through parameters and return values. An END FUNCTION or END SUB must follow an EXTERNAL FUNCTION or EXTERNAL SUB, respectively, command, with no such lines in between. The lines in between form the function or subroutine's body. External functions and subroutines may not be nested inside other external functions and subroutines, nor inside main pro- grams. The EXTERNAL FUNCTION and EXTERNAL SUB commands are valid only following an END, END FUNCTION and END SUB commands. External functions and subroutines, like the main program, are classified as program units. DO Begins a DO loop. There must be a corresponding LOOP command later in the same program unit. The DO command takes an optional "WHILE test" or "UNTIL test" parameter, where "test" is a relational expression. If "test" is provided, then at the beginning of each iteration, it is evaluated and if it's value is true (for "UNTIL") or false (for "WHILE"), the loop is termi- nated and control is passed to the line following the corre- sponding LOOP command. DO loops form a nesting block; a DO loop may be nested in other nesting blocks, but it must not be inter- leaved with other nesting blocks. In other respects, a DO com- mand is ignored. END Signals the end of the main program; cannot be used elsewhere. END FUNCTION, END SUB See the commands EXTERNAL FUNCTION and EXTERNAL SUB. EXIT FUNCTION, EXIT SUB These commands are valid only inside external functions and sub- routines, respectively. Their execution causes the immediate termination of the program unit iin question, causing a return to the calling unit. FOR Begins a FOR loop. There must be a corresponding NEXT command in a later line in the same program unit. The full command has the form "FOR i = e1 TO e2 STEP e3", where "e1", "e2" and "e3" are numeric expressions and "i" is a simple numeric variable, called the induction variable of this loop. There may be "STEP e3" missing, in which case "e3" has the value 1. A FOR loop must not be nested in another FOR loop having the same induction variable. When FOR is executed, first "e1", "e2" and "e3" are evaluated (and the values remembered); then "i" is implicitly declared, if necessary, and the value of "e1" is assigned to it. Then control is passed to the following line. When the corre- sponding NEXT command is executed, the (remembered) value of "e3" is added to "i" and the resulting value of "i" is compared to the (remembered) value of "e3"; if the value of "i" is smaller than the (remembered) value of "e3", a new iteration is started by passing control to the line following the FOR com- mand; otherwise the loop is exited and control passes to the line following the NEXT command (and the remembered values are forgotten). FOR loops form a nesting block; a FOR loop may be nested in other nesting blocks, but it must not be interleaved with other nesting blocks. IF The IF command has two forms: single-line and block. The sin- gle-line form has the structure "IF test THEN command ELSE com- mand", where "ELSE command" may be omitted. The block form has the structure k IF test THEN ... l ELSEIF test THEN ... m ELSE ... n END IF where k, l, m and n are (growing) line numbers and the tests are relational expressions. The semantics of the IF command is what one would expect. LET A LET command has the form "LET x1,...,xN = e", where x1,...,xN are numeric or string variable references (all having the same type!) and e is an expression having the same type as the vari- ables. Both numeric and string variable references can be sim- ple named variables or array element references; a string vari- able reference may also include a substring specification. The expression is first evaluated, as are any subexpressions of the variable references, and then the value of the expression is assigned to the referenced variables. Assignment to a substring modifies the underlying string variable by replacing the refer- enced substring with the string denoted by the expression, even if that string is of different length than the substring (this will cause the length of the underlying string variable to change accordingly). Simple variables are implicitly declared, as necessary, by being on the left side of a LET command; array variables need to be declared separately. LOOP Ends a DO loop. There must be a corresponding DO command ear- lier in the same program unit. The WHILE command takes an optional "WHILE test" or "UNTIL test" parameter, where "test" is a relational expression. If "test" is provided, then at the end of each iteration, it is evaluated and if it's value is true (for "WHILE") or false (for "UNTIL"), a new iteration is started by passing control to the corresponding DO command, and if the value is not so, then the command passes control to the follow- ing line. If "test" is not provided, LOOP behives as if a "WHILE test" had been provided for some "test" that always evaluates to true. NEXT Ends a FOR loop. There must be a corresponding FOR command in an earlier line in the same program unit. The command takes one argument, which must be the name of a numeric variable, and that variable must be the induction variable of this loop. Other- wise, the behaviour of NEXT is described in the section dis- cussing the FOR command. PRINT Writes to the standard output stream the character sequence described by the command's arguments: expressions or TAB-speci- fications separated by commas or semicolons. The value of a numeric expression is converted into a textual representation of the value; strings are printed verbatim. A comma separator specifies that the next item is aligned on the next tab stop (initially, tab stops are 20 characters apart); a semicolon indicates that the next item is printed immediately following the previous item. A line separator is inserted immediately before an item if the current line exceeds the current print margin (initially, 80 characters). TAB-specifications are cur- rently ignored. The command may be terminated by a comma or a semicolon; in which case it is treated as if it separated the last item of this command and the first item of the next PRINT command encountered. If the command ends at an item (with no terminating comma or semicolon), or if the command has no argu- ments, a line separator is appended to the output. PROGRAM An optional command that can be used on the first line of a pro- gram. Requires an identifier (the program name) and a list of arguments (see the EXTERNAL command). The name and the argu- ments are currently ignored. REM A full-line comment; equivalent to using the exclamation charac- ter after the line number. OPTIONS Where options take arguments, the argument must be provided in the same argument slot; that is, -O2 is allowed but -O 2 is not. -no-delete Do not delete the intermediate files when done. -Wno Disable warnings. (Warnings are enabled by default.) -Werror Make warnings errors. -fnative Use native binary floating-point numbers intead of the software- based decimal floating-point numbers. NOTE: This option is potentially dangeous; do not use it unless you know what you are doing. Currently this option is broken. -fno-line-numbers Lift the requirement for line numbers in source code, in fact making them forbidden. The compiler will internally count log- ical lines and use these counts as line numbers where they are required. Note that this option makes this compiler non-compli- ant to Standard BASIC. -c, -S These options are passed to the underlying C compiler. -O, -cstd=, -I, -L, -o These options are passed to the underlying C compiler, and they take an argument. -cflag= Pass the argument to this option to the underlying C compiler. TEMPORARY FILES The compiler creates - and overwrites any existing - input.bas.c and input.bas.h where input.bas is the source file name provided on the command line. CONFORMING TO ANSI INCITS 113-1987 (partially) BUGS The following are known deficiencies in the implemented parts of the language. Completely missing commands and intrinsic functions are not listed. - TAB-specifications are parsed but otherwise ignored in PRINT statements. - The arguments to the PROGRAM command are parsed but otherwise ignored. - Channel arguments to functions and subroutines are not sup- ported. The following are known deficiencies in the compiler (other than lan- guage issues): - The -fnative option is broken (missing support from the runtime library). AUTHOR Antti-Juhani Kaijanaho SEE ALSO gcc(1) Agora Basic 2006-05-04 agora-basic-compiler(1)