Building an Ad Hoc scanner in Python

profilered13rt
Program2.pdf

CS365 – Organization of Programming Languages Program 2 Objective Learn the basics of programming an ad-hoc language scanner Due Date 3/3/21 Assignment Build the start of an ad hoc scanner to handle a simple language. The output for the scanner will be a list, and in order, of the type of token, a comma, and the actual value of the token (the token and the lexeme).

• This must be written in Python. • If you create more than one Python file, compress all of your files into

a single file for uploading. • You must use file input. Pick up the source code file name from the

command line. Exit gracefully with an error message if the input file does not exist.

• Write the output to the screen. • There is no error checking required of file names OTHER than to

ensure if they are present – your program will not be given an illegal file name when testing.

• There is no guarantee of whitespace except where required to separate tokens. A line feed is a form of whitespace. There will not be any tab characters in the source code.

• An <id> token is a string of alphabetic characters starting with an upper or lower case ‘a’-‘z’, followed by either an upper or lower case ‘a’-‘z’ or ‘0’-‘9’ (standard identifier rules without the underscore character). An <id> cannot be one of the reserve words in the grammar.

• A <number> can be either an integer or a floating point value. A floating point value WILL have at least one digit both in front and after a decimal point.

• An <lparen> token is a (

• An <rparen> token is a ) • An <add_op> token is either a + or – • A <mult_op> token is either a *, /, //, or %. • A <rel_op> token is either a <, >, <=, >=, ==, or != • The <assign> token, the assignment operator, is the equal sign (“=”). • The “reserved” words in the grammar are all lowercase and case-

specific. They include: read, write, if, and else. Their respective tokens are <read>, <write>, <if> and <else>.

• A # symbol indicates a comment. Ignore the remainder of the current physical line. This will not generate a token.

• If your program encounters text that is not a valid token, write out an <error> token and the invalid text (not the remainder of the file). Stop processing the input file after dealing with an error.

For example, if the input file contains: #sample “source code” for simple language read a read b c = a + b - 3 write c your output should contain: <read>, read <id>, a <read>, read <id>, b <id>, c <assign>, = <id>, a <add_op>, + <id>, b <add_op>, - <number>, 3 <write>, write <id>, c