Back Contents Next

2. Lexical Structure

This section gives an informal account of the lexical conventions used in writing Better Scheme programs.

Upper and lower case forms of a letter are always remembered (except within variable references) but never distinguished . For example, Foo is the same identifier as FOO and #\xff is the same character as #\XFF.

2.1 Identifiers

Most identifiers allowed by other programming languages are also acceptable in Better Scheme. The precise rules for forming identifiers vary among implementations of Better Scheme, but in all implementations a sequence of letters, digits, and "extended alphabetic characters" is an identifier. In addition, +, -, and ... are identifiers. Here are some examples of identifiers:

lambda q
list->vector soup
+ V17a
<=? a34kTMNs
42 5ducks
the-word-recursion-has-many-meanings

Extended alphabetic characters may be used within identifiers as if they were letters. The following are extended alphabetic characters:

! $ % & * + - . / : < = > ? @ ^ _ ~

Identifiers have three interpretations within Better Scheme programs:

  1. Some identifiers are interpreted as literals (see section 2.3 Literals).
  2. When an identifier appears as a literal or within a literal it is being used to denote a symbol (see section 5.6.3 Symbols).
  3. Any symbol may be used as a variable name.

2.2 Whitespace and Comments

Whitespace characters are spaces, tabs and newlines (implementations typically provide additional whitespace characters such as page break). Whitespace is used for improved readability and is necessary to separate tokens from each other, a token being an indivisible lexical unit such as an identifier or number, but is otherwise insignificant. Whitespace may occur between any two tokens, but not within a token. Whitespace may also occur inside a string, where it is significant.

A semicolon ";" indicates the start of a comment. The comment continues to the end of the line on which the semicolon appears. Comments are invisible to Scheme, but the end of the line is visible as whitespace. This prevents a comment from appearing in the middle of an identifier or number.

For example:
;; The fact function computes the factorial
;; of a non-negative integer.
(define fact
(lambda (n)
(if (= n 0)
1;Base case - return 1
(* n (fact (- n 1))))))

2.3 Literals

A literal is a identifier or other syntax which directly represents a Better Scheme entity rather than a variable which refers to a location containing some entity. When literals are encountered in the program text, the value they represent is substituted for them. When one of those values is displayed it is generally represent as the literal corresponding to it.

2.3.1 Boolean Literals

The two boolean literals are written #t and #f, however all standard conditional functions treat all entities as true (including null) except for the literal #f. The phrase a true value (or sometimes just true) means any entity treated as true by the conditional functions, and the phrase a false value (or false) means any object treated as false by the conditional functions.

literal macro: (#t <true-expression> <false-expression>)

True is a macro which evaluates to its first argument. The second argument is ignored and not evaluated. The two arguments may be curried.

literal macro: (#f <true-expression> <false-expression>)

False is a macro which evaluates to its second argument. The first argument is ignored and not evaluated. The two arguments may be curried.

2.3.2 List Literals

literal: (<expression> ...)

A list is a sequence of expressions enclosed by parenthesis and separated by spaces. The standard semantics of Scheme evaluate lists as invocations (see section 4.3 Invocation). A list can be included as a literal by quoting it (see section 4.2 Literal Expressions).

literal: (<expression1> <expression> ... . <tail-expression>)

An improper list, that is a list whose tail is not a list may be formed by use of the cons dot. The cons dot is placed before the last element of the list and seperated from any surrounding identfiers by whitespace lest it be taken as part of an identfier. The last element is then taken as the tail of the list. The value after the cons dot can be a list so that the expression "'(1 . (2 . (3 . ())))" evalutes to (1 2 3).

2.3.3 Character Literals

literal: #\X

A pound sign followed by a backslash and a character is the character literal for that character. This is case sensitive.

literal: #\xFF
literal: #\xFFF
literal: #\xFFFF

A pound sign followed by two to four hexadecimal digits is the character literal for the character corresponding to that ascii or unicode character. The three digit form is taken to have a implicit leading zero to represent a unicode character.

literal: #\space

The character literal for the space character.

literal: #\tab

The character literal for the tab character.

literal: #\newline

The character literal for the newline character.

2.3.4 String Literals

literal: "<character>..."

A string is a sequence of characters enclosed in double quotes. To allow string literals to be written which contain none written characters or double quotes escape sequences are used. Within a string literal the following (case-sensitive) escape sequence can be used:

\n Newline
\r Carriage Return
\0 Null
\t Tab
\\ Backslash
\" Double Quote
\xFF ASCII literal expressed in hexadecimal
\uFFFF Unicode literal expressed in hexadecimal

A backslash followed by some other sequence of characters is taken not to be an escape sequence.

Note: the hecadecimal digits in the ASCII and Unicode escape codes are case-insensitive

Note: the following characters are reserved for future escape sequences 'f', and 'b'. It is strongly recommended that one escape backslashes that appear before these characters.

2.3.5 Vector Literals

literal: [ <expression> ...]

A vector is a sequence of expressions enclosed by square brackets and separated by spaces. The standard semantics of Scheme evaluate vectors as invocations (see section 4.3 Invocation). A vector can be included as a literal by quoting it (see section 4.2 Literal Expressions).

2.3.6 Numeric Literals

The syntax of the written representations for numbers is described formally in section 7.1.1 Lexical structure. Note that case is not significant in numerical constants.

A number may be written in binary, octal, decimal, or hexadecimal by the use of a radix prefix. The radix prefixes are '#b' (binary), '#o' (octal), '#d' (decimal), and '#x' (hexadecimal). With no radix prefix, a number is assumed to be expressed in decimal.

A numerical constant may be specified to be either exact or inexact by a prefix. The prefixes are '#e' for exact, and '#i' for inexact. An exactness prefix may appear before or after any radix prefix that is used. If the written representation of a number has no exactness prefix, the constant may be either inexact or exact. It is inexact if it contains a decimal point or an exponent, otherwise it is exact.

In systems with inexact numbers of varying precisions it may be useful to specify the precision of a constant. For this purpose, numerical constants may be written with an exponent marker that indicates the desired precision of the inexact representation. The letters 's', 'f', 'd', and 'l' specify the use of short, single, double, and long precision, respectively. (When fewer than four internal inexact representations exist, the four size specifications are mapped onto those available. For example, an implementation with two internal representations may map short and single together and long and double together.) In addition, the exponent marker 'e' specifies the default precision for the implementation. The default precision has at least as much precision as double, but implementations may wish to allow this default to be set by the user.

3.14159265358979F0 => 3.141593 ;Round to single
0.6L0 => .600000000000000 ;Extend to long

2.3.7 Void

Just as null is a value representing the absence of a string pair, void is a value representing the absence of a function.

literal macro: (#void <expression> ...)

Void is a macro which evaluates to itself and does not evaluate its arguments.

2.4 Other Notations

Beyond whitespace, comments, identfers and literals there are a few other lexical considerations in Better Scheme.

2.4.1 Evaluative Operator

syntax: ,<expression>

The evaluative operator (see section 4.4 Evaluative Operator) is the comma. It may be placed in front of any expression to form a new expression which is the evaluation of the first. Since expressions in many contexts are evaluated anyway, this can cause a "double evaluation."

2.4.2 Quote Operator

syntax macro: '<expression>

An expression may be quoted by placeing a single quote before it, thereby forming a literal expression (see section 4.2 Literal Expressions).

2.4.3 Reserved Notation

The following characters are reserved for possible future extension to the language.

{ } |



Back Contents Next

jwalker@cs.oberlin.edu