Alxa_Compiler(SysY to PCODE)
- Published on
设计有限自动机进行词法分析,处理非法字符,建立抽象语法树,使用递归下降方法进行语法分析和错误处理,将SysY语言编译为PCode代码,最终设计了对应的虚拟机进行解释执行。
Lexical Analysis 词法分析
Before Coding
Requirement: Read testfile.txt
, parse every character into words and print them. At the same time, memorize the type, content, and line number of each word.
File reading
Read by line, scan every character of every string and analyze.
while ((s = bf.readLine()) != null) {
...
}
Analyse
Upon identifying the keyword, proceed to the next analysis.
while ((c = getChar()) != null) {
if (c == ' ' || c == '\r' || c == '\t') {
continue;
} else if (c == '+' || c == '-' || c == '*' || c == '%') {
words.add(new Word(c));
} else if (c == '/') {
analyseSlash();
} else if (c == '(' || c == ')' || c == '[' || c == ']' || c == '{' || c == '}') {
words.add(new Word(c));
} else if (c == '>' || c == '<' || c == '=' || c == '!') {
analyseRelation(c);
} else if (c == ',' || c == ';') {
words.add(new Word(c));
} else if (c == '"') {
analyseCitation();
} else if (c == '&' || c == '|') {
analyseLogic(c);
} else if (Character.isDigit(c)) {
analyseDigit(c);
} else if (Character.isLetter(c) || c == '_') {
analyseLetter(c);
}
}
Common
For example, when encountering ’+’, directly create a new Word and typify it as “PLUS”.
Function
For example
When encountering <
, enter the function analyseRelation
to read one more character. If it is =
, analyze as LEQ
…
if (c == '<') {
c = getChar();
if (c == '=') {
words.add(new Word("<="));
} else {
unGetChar();
words.add(new Word("<"));
}
analyseLogic
is as the same.
Digit and Letter
Digit: When encountering a digit, scan a series of digits and turn them into a Word typified as “INTCON”.
Letter: When encountering a letter, scan a string of letters or digits. It may be an “IDENFR” or “STRCON”, depending on whether it is in the key map or not.
Word
class Word:
public class Word {
private String identification;
private String content;
private String type;
}
Encapsulate the initial function so that only new Word(...)
is needed in the main processor, which will create the corresponding word.
For example
public Word(char identification) {
this.identification = String.valueOf(identification);
this.type = new KeyWordMap().getType(this.identification);
this.content = this.identification;
}
As for KeyWordMap, it is a HashMap that maps the string of a word to its type.
public KeyWordMap() {
keyWords = new HashMap<>();
keyWords.put("main", "MAINTK");
keyWords.put("const", "CONSTTK");
keyWords.put("int", "INTTK");
...
}
After Coding
File reading
Reading the file line by line is not convenient for prereading and undoing, so the file is read into a single String at first.
The method involves reading the file line by line, adding \n
after every line, and scanning every character. When \n
is encountered, lineNum++
private String transferFileToCode() {
BufferedReader bf = new BufferedReader(reader);
StringBuffer buffer = new StringBuffer();
String s = null;
while ((s = bf.readLine()) != null) {
buffer.append(s).append("\n");
}
return buffer.toString();
}
Analyse
Regarding analysis, it is different from what was described before coding.
First, words need to be analyzed one by one, so a global variable index
is added to remember the pointer’s position.
Additionally, situations may arise where reading one more character or undoing a read is necessary, so the functions ungetChar
and getChar
are encapsulated to facilitate the analysis.
private Character getChar() {
if (index < code.length()) {
char c = code.charAt(index);
if (c == '\n') {
lineNum++;
}
index++;
return c;
} else {
return null;
}
}
private void unGetChar() {
index--;
char c = code.charAt(index);
if (c == '\n') {
lineNum--;
}
}
Slash
//
: When it comes to\n
, stop.
do {
c = getChar();
if (c == null || c == '\n') {
return;
// 判断为//注释,结束分析
}
} while (true);
/* */
: Get char until*/
appears.
do {
c = getChar();
if (c == null) {
return;
}
if (c == '*') {
c = getChar();
if (c == '/') {
return;
// 判断为/* */注释,直接结束分析
} else {
unGetChar();
}
}
} while (true);
Grammar Analysis 语法分析
Requirement: Based on the words identified by the lexical analysis program, identify various grammatical elements according to the grammatical rules. The recursive descent method is used to analyze the grammatical components defined in the grammar.
Before Coding
Data Reading
Similar to the lexical analysis, functions like getWord
and getNextWord
are prepared. Additionally, there is a global variable (Word) curWord
to indicate the current word when reading ArrayList<Word> words
from the lexical analysis one by one.
The analysis strategy is as follows:
- For normal rules: Keep getting words and analyze them.
- For expression rules: First, scan the entire expression using the function
getExp
. Then, divide the expression and use the recursive descent method to analyze it.
getExp
like
private ArrayList<Word> getExp() {
ArrayList<Word> exp = new ArrayList<>();
while (true) {
if (word is symbol of end) {
break;
}
...
getWordWithoutAddToGrammar();
exp.add(curWord);
word = getNextWord();
}
return exp;
}
recursive descent
According to grammatical rules, code functions for each term of the rule.
Main idea: Read a word, check what it symbolizes, and enter the next analyzing function.
For example:
to
CompUnit → {Decl} {FuncDef} MainFuncDef // 1.是否存在Decl 2.是否存在 FuncDef
I analyze like this:
private void analyseCompUnit() {
Word word = getNextWord();
while (word.typeEquals("CONSTTK") || (
word.typeEquals("INTTK") && getNext2Word().typeEquals("IDENFR") && !getNext3Word().typeEquals("LPARENT"))) {
analyseDecl();
word = getNextWord();
}
while (word.typeEquals("VOIDTK") || (
(word.typeEquals("INTTK") && !getNext2Word().typeEquals("MAINTK")))) {
analyseFuncDef();
word = getNextWord();
}
if (word.typeEquals("INTTK") && getNext2Word().typeEquals("MAINTK")) {
analyseMainFuncDef();
} else {
error();
}
grammar.add("<CompUnit>");
}
grammar
is used to memorize the output of both the lexical analysis and the grammar analysis lists.
left recursion
加减表达式 AddExp → MulExp | AddExp ('+' | '−') MulExp // 1.MulExp 2.+ 需覆盖 3.- 需覆盖
Check if the expression contains ’+’ or ’-’. If it does, separate the expression into AddExp
and MulExp
. Then analyze them separately.
After Code
left recursion
The previous method is not perfect for recursive descent. Therefore, the approach has been revised and rewritten.
to
加减表达式 AddExp → MulExp | AddExp ('+' | '−') MulExp // 1.MulExp 2.+ 需覆盖 3.- 需覆盖
Rewrite it like
AddExp → MulExp ('+' | '−') MulExp ('+' | '−') MulExp ...
Code like
private void analyseMulExp(ArrayList<Word> exp) {
Exps exps = divideExp(exp, new ArrayList<>(Arrays.asList("MULT", "DIV", "MOD")));
int j = 0;
for (ArrayList<Word> exp1 : exps.getWords()) {
analyseUnaryExp(exp1);
grammar.add("<MulExp>");
if (j < exps.getSymbols().size()) {
grammar.add(exps.getSymbols().get(j++).toString());
}
}
}
Function divideExp
is used to divide the entire expression passed by getExp
or a preceding function.
divideExp
:
Input:
- Original expression:
exp
- Stop symbol:
symbol
Output:
- List of divided expressions and symbols.
private Exps divideExp(ArrayList<Word> exp, ArrayList<String> symbol) {
ArrayList<ArrayList<Word>> exps = new ArrayList<>();
ArrayList<Word> exp1 = new ArrayList<>();
ArrayList<Word> symbols = new ArrayList<>();
boolean unaryFlag = false;
int flag1 = 0;
int flag2 = 0;
for (int i = 0; i < exp.size(); i++) {
Word word = exp.get(i);
if (word.typeEquals("LPARENT")) {
flag1++;
}
if (word.typeEquals("RPARENT")) {
flag1--;
}
if (word.typeEquals("LBRACK")) {
flag2++;
}
if (word.typeEquals("RBRACK")) {
flag2--;
}
if (symbol.contains(word.getType()) && flag1 == 0 && flag2 == 0) {
//UnaryOp
if (word.typeOfUnary()) {
if (!unaryFlag) {
exp1.add(word);
continue;
}
}
exps.add(exp1);
symbols.add(word);
exp1 = new ArrayList<>();
} else {
exp1.add(word);
}
unaryFlag = word.typeEquals("IDENFR") || word.typeEquals("RPARENT") || word.typeEquals("INTCON") || word.typeEquals("RBRACK");
}
exps.add(exp1);
return new Exps(exps, symbols);
}
Exps
public class Exps {
private ArrayList<ArrayList<Word>> words;
private ArrayList<Word> symbols;
}
other bugs
Most bugs are produced by the functions getExp
and divideExp
due to some overlooked situations, often resulting in errors like index out of range. Therefore, adjustments were made to some symbols for stopping the expression parsing and modifications were made to the rules for dividing or not dividing the expression, among other changes.
Error Handling 错误处理
Before Coding
Create the symbol table
Symbol class
public class Symbol {
private String type;
private int intType;
private String content;
private int area = 0;
}
Type represents the type of the symbol.
IntType
is an integer. If it’s 0, the symbol is an int. If it’s 1, the symbol is an int[]. If it’s 2, the symbol is an int[][], and so on.
Content
is its content.
Area
indicates where it is.
A HashMap
of Symbols is created to memorize symbols created in each area.
When entering a new area, area++
. When leaving an area, area--
, with the corresponding symbols being destroyed.
private HashMap<Integer, Symbols> symbols = new HashMap<>();
private HashMap<String, Function> functions = new HashMap<>();
private ArrayList<Error> errors = new ArrayList<>();
private int area = -1;
private boolean needReturn = false;
private int whileFlag = 0;
needReturn
indicates whether the current function needs to return.
whileFlag
indicates whether the current code block is within a while loop.
Errors
a
Just check format.
public boolean isFormatIllegal() {
for (int i = 1; i < content.length() - 1; i++) {
char c = content.charAt(i);
if (!isLegal(c)) {
if (c == '%' && content.charAt(i + 1) == 'd') {
continue;
}
return true;
} else {
if (c == '\\' && content.charAt(i + 1) != 'n') {
return true;
}
}
}
return false;
}
b c
B: Every time an identifier is encountered, check if the same symbol has already been defined in the current area.
private boolean hasSymbolInThisArea(Word word) {
return symbols.get(area).hasSymbol(word);
}
C: Check all areas. If the symbol has been defined, handle functions in the same manner.
private boolean hasSymbol(Word word) {
for (Symbols s : symbols.values()) {
if (s.hasSymbol(word)) {
return true;
}
}
return false;
}
d e
To check if the function parameters are matched, the parameters of every function are memorized. When a function call is encountered, the function call parameters are scanned and matched. A function is prepared to handle this. It was found that recursive descent is needed again, so the check procedure is added to the recursive descent of the grammatical analyzer. Please check the After Code/Error d and e
.
f g
There is a global variable needReturn
used to indicate if the current function needs to return. If it does, but there is no return at the end of the code block, or if it doesn’t, but there is a return, the error will be recorded.
h
Simply check if it is a constant.
if (isConst(word)) {
error("h", word.getLineNum());
}
i j k
Encapsulate the function for checking the missing symbol.
For example:
private void checkParent() {
if (getNextWord().typeEquals("RPARENT")) {
getWord();// )
} else {
error("j");
}
}
l
Count the number of parameters for string
and printf
separately and check if they are equal.
m
There is a global variable whileFlag
indicating if the code block is within a while loop. If it isn’t, any continue
or break
statements will produce an error.
After Coding
Area
I increment area++
when entering a block or a function, but this leads to a situation where the parameters of the function can’t be memorized in a different area from the block of the function. Therefore, I changed the rules for marking area++
.
private boolean analyseBlock(boolean fromFunc) {
...
if (!fromFunc) {
addArea();
}
...
}
Only when the block is not from the function, the area++.
Error d and e
To check if the function parameters are matched, an array is set for each function.
public class Function {
private String type;
private String content;
private String returnType;
private ArrayList<Integer> paras;
}
When encountering a function, its return type and parameters are memorized.
For the ArrayList<Integer> paras
, it reflects as follows:
Type | Example | Integer |
---|---|---|
Void | -1 | |
Int | a | 0 |
Int[] | a[] | 1 |
Int[] [] | a[] [3] | 2 |
So, when encountering a function call, the parameters will be checked against what has been memorized before.
private void checkParasMatchRParas(Word ident, ArrayList<Integer> paras, ArrayList<Integer> rparas) {
if (paras.size() != rparas.size()) {
error("d", ident.getLineNum());
} else {
for (int i = 0; i < paras.size(); i++) {
if (!paras.get(i).equals(rparas.get(i))) {
error("e", ident.getLineNum());
}
}
}
}
To get the real type of the parameters, the analysis procedure is added to the recursive descent of the grammatical analyzer. For example:
private int analyseExp(ArrayList<Word> exp) {
int intType = analyseAddExp(exp);
grammar.add("<Exp>");
return intType;
}
Every recursion will return an intType
, symbolizing the final type of the expression.
Because the terms of an expression must be of the same type, only one of them is returned.
This is the exit of the recursion. It will return the correct type of the expression to the top of the function.
private int analyseLVal(ArrayList<Word> exp) {
int intType = 0;
...
if (word.typeEquals("LBRACK")) {
intType++;
...
}
...
if (hasSymbol(ident)) {
return getSymbol(ident).getIntType() - intType;
} else {
return 0;
}
}
Code Generation 代码生成
In this part, I chose to generate Pcode.
I designed a type of Pcode which is an Inverse Polish notation expression stack and symbol table-based virtual code.
At the same time, I designed a virtual machine to execute it.
The Pcode virtual machine is an imaginary machine used to run Pcode commands. It consists of a code area (code), an instruction pointer (EIP), a stack, a var_table, a func_table, and a label_table.
In the following passage, I will first introduce how Pcode executes and then explain how to produce Pcode.
Before Coding
How does the virtual machine run
First, we need a codes
list and a stack
(int).
An eip
represents the address of the currently running code.
A varTable
memorizes the address of the variable in the stack.
A funcTable
memorizes the address of the function in the codes list.
A labelTable
memorizes the address of the label in the codes list.
Then, run the code one after another and manage the stack.
How to distinguish different variables
Before generating codes, differentiate variables from different scopes by their unique scope number, like: areaID + "_" + curWord.getContent()
. In this situation, a variable will not appear more than once in the codes, except for recursive function calls, which will be solved by pushing the varTable
to the stack (shown later).
Specific Code Definition
First, define a class for PCode:
public class PCode {
private CodeType type;
private Object value1 = null;
private Object value2 = null;
}
It represents one code object, which has a CodeType
and two operating values. CodeType
is an enum. Value1
and Value2
may be Integer, String, or null, depending on the specific code type.
Calculation Type
Two operators:
int b = pop();
int a = pop();
push(cal(a,b));
Single operator:
push(cal(pop()));
VAR
The VAR command declares a variable, saving the variable name and the address assigned to it in the variable table.
case VAR: {
Var var = new Var(stack.size());
varTable.put((String) code.getValue1(), var);
}
Var.class:
public class Var {
private int index;
private int dimension = 0;
private int dim1;
private int dim2;
}
DIMVAR
The DIMVAR command declares an array, setting the dimension information of the variable.
case DIMVAR: {
Var var = getVar((String) code.getValue1());
int n = (int) code.getValue2();
var.setDimension(n);
if (n == 1) {
int i = pop();
var.setDim1(i);
}
if (n == 2) {
int j = pop(), i = pop();
var.setDim1(i);
var.setDim2(j);
}
}
PLACEHOLDER
The PLACEHOLDER command grows the stack down, allocating new space for variables and arrays.
case PLACEHOLDER: {
Var var = getVar((String) code.getValue1());
int n = (int) code.getValue2();
if (n == 0) {
push(0);
}
if (n == 1) {
for (int i = 0; i < var.getDim1(); i++) {
push(0);
}
}
if (n == 2) {
for (int i = 0; i < var.getDim1() * var.getDim2(); i++) {
push(0);
}
}
}
Other
Calculation type: Pop the stack top once or twice, perform the calculation, and push the result back onto the stack.
Jump type: For jump commands, check if the condition is satisfied and change the eip
accordingly.
Function call: As follows
Function call procedure
First, before the function call, parameters are pushed onto the stack. Each parameter push is followed by an RPARA
command, which memorizes the address of the previous variable.
case RPARA: {
int n = (int) code.getValue1();
if (n == 0) {
rparas.add(stack.size() - 1);
} else {
rparas.add(stack.get(stack.size() - 1));
}
}
Second, the function CALL
.
Memorize the eip
, stack top address, and information about the function (these will be pushed onto the stack as well). Then update the varTable
and eip
, preparing to execute the function.
case CALL: {
Func func = funcTable.get((String) code.getValue1());
retInfos.add(new RetInfo(eip, varTable, stack.size() - 1, func.getArgs(), func.getArgs(), nowArgsNum));
eip = func.getIndex();
varTable = new HashMap<>();
callArgsNum = func.getArgs();
nowArgsNum = 0;
}
Finally, when it’s RET
, return.
Restore eip
and varTable
from RetInfo
, and clear the new information pushed onto the stack during the function call.
case RET: {
int n = (int) code.getValue1();
RetInfo info = retInfos.remove(retInfos.size() - 1);
eip = info.getEip();
varTable = info.getVarTable();
callArgsNum = info.getCallArgsNum();
nowArgsNum = info.getNowArgsNum();
if (n == 1) {
stack.subList(info.getStackPtr() + 1 - info.getParaNum(), stack.size() - 1).clear();
} else {
stack.subList(info.getStackPtr() + 1 - info.getParaNum(), stack.size()).clear();
}
}
Value or Address
Pushing the value or address of a variable is important and depends on the specific need, which will be explained when I describe how to generate codes.
The command action is as follows (getAddress
is used to get the address of the previous variable).
case VALUE: {
Var var = getVar((String) code.getValue1());
int n = (int) code.getValue2();
int address = getAddress(var, n);
push(stack.get(address));
}
...
case ADDRESS: {
Var var = getVar((String) code.getValue1());
int n = (int) code.getValue2();
int address = getAddress(var, n);
push(address);
}
Code Generation
Code is generated from the grammatical analysis procedure.
Declaration
There is no need to distinguish between constants and variables. When declaring a variable, create a new variable and let it point to the stack top. If it has an initialization, push the values one after another. If not, add a PLACEHOLDER
command to push something (I push 0) to the stack to hold the place.
Assignment Sentence
In this situation, first calculate and push the address of the variable to the stack top. Then analyze the expressions. After that, there will only be two numbers in the stack: the address and the value. Assign the value to the address.
Condition Control Sentence
First, generate labels. Then, place jump sentences in the proper places.
Labels for if
and while
will be generated and then stored in a stack-like structure, like:
whileLabels.add(new HashMap<>());
whileLabels.get(whileLabels.size() - 1).put("while", labelGenerator.getLabel("while"));
whileLabels.get(whileLabels.size() - 1).put("while_end", labelGenerator.getLabel("while_end"));
whileLabels.get(whileLabels.size() - 1).put("while_block", labelGenerator.getLabel("while_block"));
Take if
as an example:
if (word.typeEquals("IFTK")) {
codes.add(new PCode(CodeType.LABEL, ifLabels.get(ifLabels.size() - 1).get("if")));
...
analyseCond("IFTK");
...
codes.add(new PCode(CodeType.JZ, ifLabels.get(ifLabels.size() - 1).get("else")));
codes.add(new PCode(CodeType.LABEL, ifLabels.get(ifLabels.size() - 1).get("if_block")));
analyseStmt();
codes.add(new PCode(CodeType.JMP, ifLabels.get(ifLabels.size() - 1).get("if_end")));
codes.add(new PCode(CodeType.LABEL, ifLabels.get(ifLabels.size() - 1).get("else")));
if (word.typeEquals("ELSETK")) {
getWord(); //else
analyseStmt();
}
codes.add(new PCode(CodeType.LABEL, ifLabels.get(ifLabels.size() - 1).get("if_end")));
}
while:
if (word.typeEquals("WHILETK")) {
...
codes.add(new PCode(CodeType.LABEL, whileLabels.get(whileLabels.size() - 1).get("while")));
...
analyseCond("WHILETK");
...
codes.add(new PCode(CodeType.JZ, whileLabels.get(whileLabels.size() - 1).get("while_end")));
codes.add(new PCode(CodeType.LABEL, whileLabels.get(whileLabels.size() - 1).get("while_block")));
analyseStmt();
...
codes.add(new PCode(CodeType.JMP, whileLabels.get(whileLabels.size() - 1).get("while")));
codes.add(new PCode(CodeType.LABEL, whileLabels.get(whileLabels.size() - 1).get("while_end")));
whileLabels.remove(whileLabels.size() - 1);
}
// break
if (word.typeEquals("BREAKTK")) {
getWord();//break
codes.add(new PCode(CodeType.JMP, whileLabels.get(whileLabels.size() - 1).get("while_end")));
...
}
// continue
if (word.typeEquals("CONTINUETK")) {
getWord();//continue
codes.add(new PCode(CodeType.JMP, whileLabels.get(whileLabels.size() - 1).get("while")));
...
}
After Coding
Because of some runtime errors and information shortages, I added and removed some Pcode. At the same time, there are some new troubles with address passing and short-circuit calculation.
Specific Code Definition
In operations, push()
means putting a value onto the top of the stack. pop()
means removing the value from the top of the stack.
Common Type
CodeType | Value1 | Value2 | Operation |
---|---|---|---|
LABEL | Label_name | Set a label | |
VAR | Ident_name | Declare a variety | |
PUSH | Ident_name/Digit | push(value1) | |
POP | Address | Ident_name | *value1 = value2 |
JZ | Label_name | Jump if stack top is zero | |
JNZ | Label_name | Jump if stack top is not zero | |
JMP | Label_name | Jump unconditionally | |
MAIN | Main function label | ||
FUNC | Function label | ||
ENDFUNC | End of function label | ||
PARA | Ident_name | Type | Parameters |
RET | Return value or not | Function return | |
CALL | Function name | Function call | |
RPARA | Type | Get parameters ready for function call | |
GETINT | Get a integer and put it into stack top | ||
String | Para num | Pop values and print. | |
DIMVAR | Ident_name | Type | Set dimension info for array variety |
VALUE | Ident_name | Type | Get the variety value |
ADDRESS | Ident_name | Type | Get the variety address |
PLACEHOLDER | Push something to hold places | ||
EXIT | Exit |
CodeType | Value1 | Value2 | Operation |
---|---|---|---|
ADD | + | ||
SUB | - | ||
MUL | * | ||
DIV | / | ||
MOD | % | ||
CMPEQ | == | ||
CMPNE | != | ||
CMPGT | > | ||
CMPLT | < | ||
CMPGE | >= | ||
CMPLE | <= | ||
AND | && | ||
OR | || | ||
NOT | ! | ||
NEG | - | ||
POS | + |
short circuit calculation
There are two situations where the short-circuit calculation is needed:
1. if(a&&b) // a is false
2. if(a||b) // b is true
This was not an easy task, and I actually spent a lot of time solving it.
Here is my method:
First, when analyzing analyseLOrExp
, every analyseLAndExp
is followed by a JNZ
, which is used to detect if the condition is false. If it is, it jumps to the if body label. At the same time, I generated a condition label, which is ready for the analyseLAndExp
.
private void analyseLOrExp(ArrayList<Word> exp, String from) {
...
for (...) {
...
String label = labelGenerator.getLabel("cond_" + i);
analyseLAndExp(exp1, from, label);
codes.add(new PCode(CodeType.LABEL, label));
if (...) {
codes.add(new PCode(CodeType.OR));
}
if (...) {
if (...) {
codes.add(new PCode(CodeType.JNZ, ifLabels.get(ifLabels.size() - 1).get("if_block")));
}
...
}
...
}
}
In the analyseLAndExp
, every analyseEqExp
is followed by a JZ
, which is used to detect if the condition is true. If it is, it jumps to the condition label I set earlier.
private void analyseLAndExp(ArrayList<Word> exp, String from, String label) {
...
for (...) {
...
analyseEqExp(exp1);
if (...) {
codes.add(new PCode(CodeType.AND));
}
if (...) {
if (...) {
codes.add(new PCode(CodeType.JZ, label));
}
...
}
}
}
By these means, short-circuit evaluation is achieved.
总结感想 Summary
先说说关于选择PCODE,选择了PCODE,早早结束了编译实验的全部任务,也没经过很多人在选择mips85+还是选择PCODE+85的痛苦,私以为我不愿陷入焦虑去纠结给分高低,也不想花大量时间去研究怎么生成mips和卷优化,因为这不是我想做和喜欢的事情。当然,生成PCODE反而是我感觉更能让我觉得有趣的事情,无需遵循某种既定的体系(虽然这可能对我没有好处),而是自己设计了PCODE指令,一边从递归下降中生成指令,一边写虚拟机解释器,在这之间,我不断修改指令生成和解释器,感觉自己能够完全地掌控这个我亲手写的编译器,最后让虚拟机完整地成功地跑代码,反而对我来说是一件很有成就感的事情。
至于这个文档为什么要写成英文,首先,想要更方便更直观地描述代码,类,对象,与其给每一个东西一个汉语解释,我认为干脆用英文来描述更显自然,第二,想在写文档的同时顺便练练英语,仅此而已,不是想给助教和看我文档的人添麻烦,如果很不方便阅读,可以翻译成汉语再读,用有道翻译完还是可以看懂的,实在不行也可以来垂询我本人。总结当然就不写英语了。
关于一些反思,在语法分析和代码生成两部分,我还是没有设计好架构就开始写,完工后出现了较多bug,甚至在代码生成de到了语法分析时候的bug,在面向对象课程的时候,已经多次反思过这件事,这实属不该,在今后的大型工程中,要继续好好注意架构设计。也非常感谢在我debug过程中帮到我的zyq,lyx,dky同学。
最后,站在我个人的角度,我认为从词法分析,语法分析,错误处理,到代码生成这四个任务,已经很好的做到了让我对一个编译器体系架构充分理解并让我把它完整的做出来,对我个人而言,这已经完整了,我算是轻松愉快地(除了在语法分析和代码生成两次任务debug的时候属实不太顺利)完成了编译实验,很有成就感,且收获颇丰。