Show understanding of how the grammar of a language can be expressed using syntax diagrams or Backus-Naur Form (BNF) notation

16.2 Translation Software (Compilers & Interpreters)

Objective

Show that you can:

  • Express the grammar of a programming language using syntax diagrams (railroad diagrams) or Backus‑Naur Form (BNF) notation.
  • Describe each stage of a translator (compiler or interpreter) and explain its purpose.
  • Relate the grammar description to the parser‑generation tools used in practice.

1. Why Grammar Matters in Translation Software

  • A translator must decide whether a sequence of symbols is a legal program. This decision is made by checking the program against the language’s grammar.
  • Grammar is a formal set of rules that define the syntactic structure of all valid programs.
  • Formal descriptions (BNF or syntax diagrams) allow the rules to be understood by both humans (for documentation) and machines (for automatic parser generation).

2. The Translation Process – Stages of a Compiler/Interpreter

The Cambridge syllabus expects you to recognise each stage and its purpose. The diagram below shows a typical pipeline; an interpreter follows the same stages, but the final “Code Generation” stage produces an intermediate form that is executed directly.

Compiler pipeline: source → lexical analyser → syntax analyser → semantic analyser → optimiser → code generator → target code

Figure 1: Typical translation pipeline

StagePurposeTypical Output
Lexical analysis (scanner)Breaks the source program into a sequence of tokens (identifiers, literals, operators, etc.).Token stream – e.g. int x = 5 ;[INT, IDENTIFIER, '=', NUMBER, ';']
Syntax analysis (parsing)Checks that the token sequence follows the grammatical rules of the language.Parse tree / abstract syntax tree (AST)
Semantic analysisEnsures the program’s meaning is correct – type checking, scope resolution, declaration before use, etc.Annotated AST (type information, symbol‑table links)
OptimisationImproves the intermediate representation without changing its meaning (e.g., constant folding, dead‑code elimination).Optimised intermediate code
Code generationTranslates the optimised representation into target code – machine code, assembly, or an intermediate form such as Reverse Polish Notation (RPN).Target program (executable, byte‑code, etc.)

2.1 Interpreter vs. Compiler – Trade‑offs

  • Compiler: translates the whole program before execution.

    Pros: fast execution, whole‑program optimisation.

    Cons: longer start‑up time, platform‑specific output.

  • Interpreter: analyses and executes statements one at a time.

    Pros: immediate feedback, platform‑independent.

    Cons: slower execution, limited optimisation.

3. Formal Ways to Describe Grammar

3.1 Syntax Diagrams (Railroad Diagrams)

  • Visual, flow‑chart‑style representation.
  • Each diagram has a start point, a series of nodes (terminals or non‑terminals), and an end point.
  • Branches represent alternatives (the “|” operator); loops represent repetition (“{ }”).
  • Optional parts are shown as a branch that may be skipped.

Syntax diagram for an if‑statement

Figure 2: Syntax diagram for if ( condition ) statement

3.2 Backus‑Naur Form (BNF)

BNF expresses a grammar as a set of production rules of the form

<non‑terminal> ::= definition

where definition may contain:

  • Terminals – literal symbols, written in quotes (e.g. "if").
  • Non‑terminals – names of other rules, enclosed in angle brackets (e.g. <expr>).
  • Operators:

    • | – choice (alternatives)
    • { } – repetition (zero or more)
    • [ ] – optional (zero or one)
    • ( ) – grouping

4. Example Language – Simple Arithmetic Expressions

4.1 BNF for the language

<expr> ::= <term> { ("+" | "-") <term> }

<term> ::= <factor> { ("*" | "/") <factor> }

<factor> ::= <integer> | "(" <expr> ")"

<integer> ::= <digit> { <digit> }

<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

4.2 Corresponding Syntax Diagrams

Each BNF rule can be turned into a diagram. Below is a textual description of the diagram for <expr>:

  • Start → <term> → Loop:

    • Branch: “+” or “–” → <term>
    • The loop may repeat zero or more times.

  • End

Syntax diagram for <expr>

Figure 3: Syntax diagram for <expr> (addition / subtraction loop)

5. Converting Between BNF and Syntax Diagrams

  1. Identify every non‑terminal in the BNF.
  2. For each production, draw a linear path from the start node to the end node.
  3. Use a branching node to represent alternatives (the “|” symbol).
  4. Use a loop back to represent repetitions (the “{ }” construct).
  5. Optional parts (“[ ]”) become a branch that may be skipped.
  6. To go the other way, read a diagram left‑to‑right, list the symbols encountered, and translate branches/loops back into BNF operators.

6. Comparing Syntax Diagrams and BNF

AspectSyntax DiagramsBNF Notation
RepresentationVisual, flow‑chart styleTextual, rule‑based
Readability for beginnersOften clearer because of graphicsRequires familiarity with BNF symbols
Machine processingNeeds conversion to a formal grammar before a parser can be generatedDirectly usable by parser generators (e.g., yacc, ANTLR)
Expressiveness / sizeCan become cumbersome for large grammarsCompact even for complex languages

7. Practical Use in Translation Software

  • The grammar is first written in BNF (or an extended form such as EBNF).
  • Tools like yacc, bison, ANTLR or Cambridge‑specific JFlex + CUP read the BNF and automatically generate a parser.
  • Syntax diagrams are valuable for:

    • Documentation of the language for programmers.
    • Manual inspection when debugging a grammar.
    • Teaching, because they illustrate the flow of tokens.

8. Quick‑Look‑Ahead Checklist – Does Your Material Cover the Required Syllabus?

Syllabus Unit (AS)Covered in these notes?Action Needed
1 Information representationAdd a concise binary‑systems cheat‑sheet and a short conversion activity.
1.2 Multimedia – graphics & soundInclude bitmap/vector examples and a file‑size calculation exercise.
1.3 Compression (lossy vs. lossless, RLE)Provide a mini‑lab compressing a text file with RLE.
2 Communication (LAN/WAN, topologies, IP, DNS, cloud)Insert IPv4/IPv6 notation, subnetting basics, and public‑vs‑private IP notes.
3 Hardware (components, RAM/ROM, embedded, buffers)Add a table contrasting SRAM/DRAM and PROM/EPROM/EEPROM.
3.2 Logic gates & circuitsProvide a half‑adder circuit, its Boolean expression, and truth table.
4 Processor fundamentals (Von Neumann, registers, ALU, CU, buses, F‑E cycle, interrupts)Insert a labelled fetch‑execute diagram and an “interrupt scenario” worksheet.
4.2 Assembly language (two‑pass assembler, addressing modes)Include a small load‑add‑store program and ask students to trace each addressing mode.
4.3 Bit manipulation (shifts, masks, test‑set)Provide bit‑mask exercises (e.g., set/clear a flag).
5 System software (OS functions, utilities, libraries)List typical utilities and map them to OS management tasks.
5.2 Language translators (assembler, compiler, interpreter, IDE features)Ensure the comparison table of compile‑time vs. run‑time error detection is present.
6 Security, privacy & data integrityAdd a short case‑study of a phishing attack and a data‑validation checklist.
7 Ethics & ownershipInsert a debate prompt on open‑source vs. proprietary licences.
8 Databases (relational concepts, ER diagrams, normalisation, DDL/DML)Provide a brief ER‑diagram example and normalisation exercise.

9. Summary

  • Translation software follows a well‑defined pipeline: lexical analysis → syntax analysis → semantic analysis → optimisation → code generation.
  • Grammar is the cornerstone of syntax analysis; it can be expressed as BNF (concise, machine‑readable) or as syntax diagrams (visual, intuitive).
  • Both representations are interchangeable; the ability to move between them is a key exam skill.

10. Quick Reference Box

Key points to remember

  • Lexical analyser → tokens.
  • Parser checks tokens against the grammar (BNF or diagram).
  • Semantic analyser adds meaning (types, scope).
  • Optimiser improves efficiency; code generator produces target code.
  • BNF: <non‑terminal> ::= …. Use |, { }, [ ].
  • Syntax diagram: start → nodes → end; branches = alternatives, loops = repetitions, optional = skip‑branch.

11. What’s Next? – Other Syllabus Units

After mastering translation software, the Cambridge syllabus continues with:

  • Unit 16.1 – Operating Systems (processes, memory management, file systems).
  • Units 1‑15 – Data representation, communication, hardware, software development, algorithms, etc.
  • Units 17‑20 (A‑Level extensions) – Advanced data structures, networking, security, artificial intelligence.

Understanding how a compiler interacts with the operating system (loading, linking, I/O) will help you see the whole picture.

12. Advanced A‑Level Extension (Sidebar)

For students aiming at the A‑Level

  • User‑defined data types – extending a grammar to describe struct or class definitions.
  • Floating‑point representation – the compiler’s role in converting literals to IEEE‑754 format.
  • Encryption & security – how compilers can embed cryptographic checks (e.g., code signing).
  • Recursion and exception handling – semantic analysis must build call‑graphs and verify try‑catch blocks.

13. Practice Questions

  1. Write BNF rules for a language that supports while loops with the syntax while ( condition ) statement.
  2. Convert the following BNF rule into a textual description of a syntax diagram:

    <assignment> ::= <identifier> "=" <expr> ";"

  3. Identify two advantages of using BNF over syntax diagrams when creating a large‑scale compiler.
  4. Given the syntax diagram for an if‑else statement (see Figure 2), write the equivalent BNF rule.
  5. Explain the role of the lexical analyser in the translation pipeline and give an example of how the source fragment int x = 10 ; is tokenised.
  6. Briefly compare an interpreter and a compiler in terms of execution speed, platform dependence, and typical use‑cases.