Python help
JHU Computer Organization Module 10 Exercise
Building a Scanner/Parser In this assignment you will use PLY, a scanner and parser generator tool to create a scanner and parser
for a grammar that is a slight modification to the one in last week’s homework (shown below). PLY is a
Python version of the popular C tools lex and yacc. Note that using a generator is a simplified way of
creating a scanner or parser and is much easier than writing either one manually.
Begin by reading the following sections of the PLY documentation (found here). The docs for PLY are very
well-written and easy-to-understand. I recommend reading the material closely and tracing through the
examples as this will help immensely when you write your scanner/parser generator.
• Introduction
• PLY Overview
• Lex
o Lex example-Literal Characters
• Parsing Basics
• Yacc
o An example-Changing the starting symbol
You will generate a scanner (lexer) and parser for the following grammar:
<stmt> : <assign> | <binop> | <declare>
<assign> : <id> ‘=’ {<term> | <str>}
<binop> : {<term> (<literal> | <exp>) <term>} | {<str> ‘+’ <str>}
<declare> : <var> {<id> | <assign>}
<term> : <id> | <num> | <binop>
<literal> : ‘+’, ‘-’, ‘*’, ‘/’
<id> : ID
<num> : NUM
<exp> : EXP
<str> : STR
<var> : ‘VAR ‘
Environment Setup
You may use any Python environment you’d like but I recommend creating a web-based REPL. You can
find the source code for the PLY lexer (lex.py) and parser (yacc.py) generators here. Add two files to your
REPL project named lex.py and yacc.py. Copy-and-paste the code from lex.py and yacc.py from the PLY
repo into the lex.py and yacc.py files in your repo:
Building a Scanner
You will do your development in main.py. Copy the skeleton code for lexer_outline.py from my public
repo into you’re the main.py file in your REPL. You will implement the methods (I recommend one at a
time) that map to each of the tokens that have been defined for you on lines 4-9. Comments have been
provided for you to help guide you in creating your parser. To test, run your REPL. You will be prompted
to input a line of code. Here are two sample outputs from the completed scanner:
1. A line of code containing all valid tokens:
2. A line of code containing an invalid token caught by the scanner:
A PLY sample for generating a scanner for the ‘VAR ‘ token has been provided for reference here. I
recommend running it first and understanding how it works before adding additional tokens to it to
complete your scanner. Here is an example of the code in action on both a valid and an invalid token:
Building a Parser
Next you’ll add on to your working scanner generator code to generate a full lexical and syntactic analyzer.
You can run my sample scanner/parser generator using the code provided here. Here is an example of
both a valid and invalid syntactical statement:
You’ll see in the code that ‘stmt’ is the entry point into the syntax analyzer and maps to a VAR token
(which was defined in the scanner you built above). The parser that is generated checks that the input
matches a VAR token and throws a syntax error if it does not.
Your job in this portion of the assignment is to combine your scanner generator code with the additional
methods needed to enforce the grammar specified above (assign, declare, binop, etc.). For reference, my
solution contains 5 methods to implement the grammar. Please note that the sample parser code is just
an example to show you how the YACC portion of PLY works and VAR may not (probably should not) be a
part of ‘stmt’ for your final solution.
Deliverables
1. parser_jhedid.py: a Python file containing the code for generating tokens and for performing
syntax analysis against the grammar (most likely this will be the code from the main.py in your
REPL)
2. A PDF named syntax_jhedid.pdf containing answers to the following question:
“For each of the following statements, indicate whether they are VALID or INVALID for the given
grammar. If you choose INVALID, please state whether the error is LEXICAL or SYNTACTIC and
explain why.”
• x = 0;
• VAR x = y ** 2
• VAR VAR = x/2
• VAR x = VAR y = 3
• x = "string" + "123"
• "VAR" X = "str"
• i = (2*2)**3
• VAR x = 1 + 2 * 3 / 5 **6
• VAR x = 1 + 2 * 3 / 5 **6 = y
• x=1*2/3**4