python program in intro to computer architecture
Spring 2023 CIS 580 Project 3 (Due Apr. 5)
1. Objectives
This project is designed to help students to understand the RISC architecture (MIPS) and its
instruction set and assembly. Students will gain programming experience in Python.
2. Goals
Your team (2 persons) will build an MIPS assembler for a subset of MIPS instructions in Python.
This assembler will read a simple MIPS program and generate an MIPS machine code output file.
3. Specifications
3.1 Input
Your assembler will read and parse the contents of a simple MIPS program (program.asm). The
file name of program.asm is given from the command line. Each line of the program contains an
MIPS instruction or a directive. A label will appear on a line by itself. Operands are comma-
separated. No line is blank. No comment is allowed. There will be no white spaces between
operands. All lines will begin with a tab except labels. Only decimal numbers are allowed.
Lines containing instructions have the following format:
<tab>instruction<tab>comma-separated-operands
Lines containing directives have the following format:
<tab>directive[<tab>comma-separated-operand]
Lines containing labels have the following format:
label:
The supported instructions are as follows. You may wish to consult additional MIPS references or
the textbook to know the details of the instructions.
Instruction Name Syntax Semantic
ADD Addition add $1,$2,$3 $1 = $2 + $3
SUB Subtract sub $1,$2,$3 $1 = $2 - $3
SLL Shift left logical sll $1,$2,5 $1 = $2 << 5
SRL Shift right logical srl $1,$2,5 $1 = $2 >> 5
SLT Set less than slt $1,$2,$3 If $2<$3, $1=1; otherwise, $1=0
ADDI Addition immediate addi $1,$2,45 $1 = $2 + 45
LUI Load upper immediate lui $1,45 Upper 16-bit of $1 = 45 (Lower 16-bit of $1 is set to 0)
ORI Or immediate ori $1,$2,45 $1 = $2 | 45 (bitwise OR)
LW Load word lw $1,100($2) $1 = Memory[$2+100]
SW Store word sw $1,100($2) Memory[$2+100] = $1
BEQ Branch on equal beq $1,$2,Label If $1=$2, jump to Label
BNE Branch on not equal bne $1,$2,Label If $1≠$2, jump to Label
J Jump j Label Jump to Label
LA * Load address la $1,Label lui $1, upper 16-bit of Label ori $1, $1, lower 16-bit of Label
* LA is a pseudo-instruction. It is used to load the memory location (Label, 32 bits) into
the destination register. The assembler replaces it with two instruction sequence, lui
followed by ori.
The supported directives are as follows.
Directive Name Syntax Semantic
.data Data segment .data (no operand)
Data section in memory; it begins at 0 (0x0000 0000) by default
.text Text segment .text (no operand)
Program section in memory; it begins at 512 (0x0000 0200) by default
.space n Allocation of n words of memory
.space 10 Allocation of 10 words (40 bytes) of memory
.word w Allocation of a word and initialized to w
. word 16 Allocation of a word (4 bytes) and initialized to 16 (0x0001 0000).
* The program may contain data section only or text section only. If both exist, the data section
proceeds the text section.
The registers are denoted as $0, $1, $2, etc. instead of “$s0” or “r1”. $0 always takes the value of
0 and is not changeable.
3.2 Output
The assembler generates an output file (program.out) of size 1KB consisting of 512B of data
segment (begins at 0x0000) and 512B of text segment (begins at 0x0200). Since each data and
instruction is 4B long, there will be at maximum 128 word data and at maximum 128 instruction
words. This file is a binary file. Do not try to open this file as it causes an error. The
assembler does not detect syntax errors and assumes the assembly input is correctly formed. We
assume big-endian.
For a sample assembly input file (test.asm):
.data
print_data:
.word 0
add_result:
.word 0
load_data: .word 1
.word 2
.text
start: la $1,load_data
lw $4,0($1)
lw $5,4($1) add $4,$4,$5
la $1,add_result
sw $4,0 ($1)
Then, the contents of the output file (test.out) will be as follows. Again, this is a binary file and
can only be seen with a special tool.
• In Linux, you will have to use the command line command “od -Ax -t x4” (octal dump).
To see the text segment (offset 0x0200), please use the command “od -Ax -t x4 -j512”.
• In Windows, you can use “format-hex.” Click the Start menu button and type
"powershell" (without the quotation marks). In the "Windows PowerShell" window,
please use the command line command “format-hex”.
• You can also view it online: https://hexed.it/
00000000 00000000 00000001 00000002
00000000 00000000 00000000 00000000
……………
3c010000 34210008 8c240000 8c250004
00852020 3c010000 34210004 ac240000
……………
Note that the file begins with the data segment (512 bytes) followed by the text segment.
“3c010000 34210008” denotes the assembled output for the LA instruction (LUI and ORI).
“8c240000” denotes the LW instruction. And so on.
Encoding of MIPS instructions is as follow. You may wish to consult additional MIPS references
or the textbook to know the details of the instructions.
Instruc- tion
Syntax MIPS instruction encoding (32 bits)
(R-type) Opcode(6) Rs(5) Rt(5) Rd(5) Shamt(5) Func(6)
ADD add $1,$2,$3 000000 N/A 100000
SUB sub $1,$2,$3 000000 N/A 100010
SLL * sll $1,$2,5 000000 N/A 000000
SRL ** srl $1,$2,5 000000 N/A 000010
SLT slt $1,$2,$3 000000 101010
(I-type) Opcode(6) Rs(5) Rt(5) Immediate (16)
ADDI addi $1,$2,45 001000
LUI lui $1,45 001111 N/A
ORI ori $1,$2,45 001101
LW lw $1,100($2) 100011
SW sw $1,100($2) 101011
BEQ beq $1,$2,Label 000100
BNE bne $1,$2,Label 000101
(J-type) Opcode(6) Offset(26)
J j Label 000010
Please note the use of labels in the branch (beq and bne) instructions. You will need to compute
the appropriate immediate fields for the machine code based on the following relationship.
Addr(Label) = Addr(inst_after_branch) + immediate(16 bits)*4
First, the immediate field represents the distance, in instructions rather than in bytes, between the
branching instruction and the destination instruction. This explains *4 at the end of the
relationship. Second, typically PC-relative addressing is relative to PC+4, not PC. That is, it is
relative to the next instruction, not the current instruction. This explains
“Addr(inst_after_branch)” in the relationship.
The Label field of the jump instruction will be defined using pseudo-direct addressing where the
address of the destination is defined as
Addr(Label) = Addr(inst_after_jump)[31-28] || immediate(26 bits) || 00
where Addr(inst_after_jump)[31-28] is the 4 most significant bits of PC+4, and || denotes
concatenation.
3.3 Testing
A few sample test assembly files will also be provided about a week before the project is due.
Make use of the sample executable to help verify your output.
You can test your design using your own programs. The quality of your assembler will be
determined by how much and how varied the tests are. For example, testing if the assembler
works for BNE for both the equal and not equal cases, and forward and backward branching, and
all combinations of these, would get a better grade than just testing for one case of BNE.
To confirm your assembler generates the correct machine code, you can get some help from
Internet, e.g., https://www.csfieldguide.org.nz/en/interactives/mips-assembler/ or
https://alanhogan.com/asu/assembler.php. Just type your assembly program such as “add $s1, $s2,
$s3”, it will output machine code for you. (Note that this assembler recognizes $s1, $s2, etc. but
does not recognize $1, $2, etc.)
4. Turning it in
Each group submits the source code (assembler.py) electronically by using the following command
on the grail machine:
turnin -c cis580x -p proj3 assembler.py At the top of the source code as comments, please list CLEARLY all the instructions and
directives your program cannot handle and all known issues with your program (those that are not
implemented, those that are implemented but not work correctly, etc.).
Start on time and good luck. If you have any questions, send e-mail to
v.mandalia@vikes.csuohio.edu or w.xiong15@csuohio.edu.