-
System software
- consists of a variety of programs that support the operation of a computer – Text editor, Compiler, Loader or Linker, Debugger, Assembler, Marco processor, Operating system
- differs in machine dependency, it is often machine dependent
-
System programs
intended to support the operation and use of the computer itself, rather than any particular application. For this reason, they are usually related to the architecture of the machine on which they are to run.
-
sic
simplified instructional computer
-
sic/xe
simplified instructional computer extra equipement or extra expensive
-
memory in sic
consists of 8-bit bytes, any 3 consecutive bytes for a word
-
addresses in sic
all adddresses in sic are byte addresses, words are addressed by the location of the their lowest numbered byte
-
sic total memory size
There are a total of 32,768 (215) bytes in SIC memory
-
registers in sic
- There are five registers, all of which have special uses.
- Each register is 24 bits in length
-
register in sic A 0
accumulator
All arithmetic operations involve register A and a word in memory
-
register in sic X 1
index register, used for addressing
-
register in sic L 2
linkage register, jump to subroutine instruction stores the return address in this register
-
register in sic PC 8
program counter, contains the address of the next instruction to be fetched for execution
-
register in sic SW 9
status word, contains a variety of info, including condition code(CC)
-
Integers
stored as 24-bit binary numbers; 2’s complement representation is used for negative values
no floating point in sic
-
addressing mode
8 bite opcode + 1 bit indicator(direct or indirect) + 15 bit address
-
COMP
compares the value in register A with a word in memory; this instruction sets a condition code CC to indicate the result (<, =, or >).
-
JSUB and RSUB
provided to subroutine linkage
-
input and output in sic
are performed by transferring 1 byte at a time to or from the rightmost 8 bits of register A
Each device is assigned a unique 8-bit code
-
Three I/O instructions
TD (Test Device), RD (Read Data), WD (Write Data)
-
Maximum memory on a SIC/XE
1 Mbyte
-
register in sic/xe B 3
base register, used for addressing
-
-
general working register no special use
-
register in sic/xe T 5
general working register no special use
-
register in sic/xe F 6
floating point accumulator(48 bits)
-
Data Formats
SIC/XE provides the same data formats as SIC
In addition, a 48-bit floating-point data type is also provided. See the top of Page 8 1 bit sign, 11 bit exponent and 36 bit fraction
-
Instruction Formats in sic/xe
format 1
8 bit opcode
-
Instruction Formats in sic/xeformat 2
8 bit opcode, 2 four bit registers
-
Instruction Formats in sic/xeformat 3
6 bit opcode, nixbpe, 12 bit displacement
-
Instruction Formats in sic/xeformat 4
6 bit opcode nixbpe, 20 bit address
-
base relative addresing
b=1, 0<=disp<=4095
-
program counter addressing
p=1 -2048<=disp<=2047
-
direct addressing
both b and p are set to 0 in formats 3 and 4
-
indexed addressing
if bit x is set to 1, the term (X) is added in the target address calculation
-
Immediate addressing
if bits i=1 and n=0, the target address itself is used as the operand value; no memory reference is performed
-
Indirect addressing mode
if bits i=0 and n=1, the word at
the location given by the target address is fetched; the value contained in this word is taken as the address of the operand value
-
Simple addressing mode
if bits i and n are both 0 or both 1, the target address is taken as the location of the operand
-
Instruction Set
includes all in sic and there are new ones to load and store the new registers LDB, STB, etc and perform floating point arithmetic ops(ADDF, SUBF, MULF, DIVF)
-
ADDR, SUBR, MULR, DIVR
Register-to-register arithmetic operations
-
svc
Supervisor call instruction
-
Traditional (CISC) Machines
The machines described in this section are classified as Complex Instruction Set Computers (CISC).
CISC machines generally have a relatively large and complicated instruction set, several different instruction formats and lengths, and many different addressing modes
The implementation of such architecture in hardware tends to be complex
-
RISC Machines
The RISC (Reduced Instruction Set Computers) concept, developed in the early 1980s, was intended to simplify the design of processors
This simplified design can result in faster and less expensive processor development, greater reliability, and faster instruction execution times
In general, a RISC system is characterized by a standard, fixed instruction length (usually equal to one machine word), and single-cycle execution of most instructions
-
assembler directives
START
Specify name and starting address for the program
-
assembler directives
END
Indicate the end of the source program and (optionally) specify the first executable instruction in the program;
-
assembler directive
BYTE
Generate character or hexadecimal constant, occupying as many bytes as needed to represent the constant
-
assembler directive
WORD
Generate one-word integer constant
-
assembler directive
RESB
Reserve the indicated number of bytes for a data area
-
assembler direct
RESW
Reserve the indicated number of words for a data area.
-
The translation of source program to object code requires us to accomplish the following functions:
1) Convert mnemonic operation codes to their machine language equivalents – e.g., translates STL to 14 (line 10);
- 2) Convert symbolic operands to their equivalent machine addresses – e.g., translate RETADR to 1033 (line 10);
- 3) Build the machine instructions in the proper format;
- 4) Convert the data constants specified in the source program into their internal machine representations – e.g., translate EOF to 454F46 (line 80);
5) Write the object program and the assembly listing.
-
The first pass
scans the source program for label definitions and assigns addresses.
-
The second pass
performs most of the actual translation
-
In addition to translating the instructions of the source program, the assembler must process statements called
assembler directives (or pseudo-instructions).
-
object program
The assembler must write the generated object code onto some output device. This object program will later be loaded into memory for execution
-
The simple object program format contains three types of records
Header, Text, and End
-
The simple assembler uses two major internal data structures
the Operation Code Table (OPTAB) and the Symbol Table (SYMTAB
-
OPTAB
must contain (at least) the mnemonic operation code and its machine language equivalent. In more complex assemblers, this table also contains information about instruction format and length.
- During Pass 1, OPTAB is used to look up and validate operation codes in the source program.
- In Pass 2, it is used to translate the operation codes to machine language
-
SYMTAB
includes the name and value (address) for each label in the source program, together with flags to indicate error condition (e.g., a symbol defined in two different places).
- During Pass 1, labels are entered into SYMTAB as they are encountered in the source program, along with their assigned addresses (from LOCCTR).
- ? During Pass 2, symbols used as operands are looked up in SYMTAB to obtain the addresses to be inserted in the assembled instruction.
-
location counter LOCCTR
used to be a variable and help in the assignment of addresses. Whenever a label in the source program is read, the current value of LOCCTR gives the address to be associated with that label.
-
Prefix to operands:
@
#
+
@ - indirect addressing
# - immediate operands
+ - extended instruction format
-
Instructions that refer to memory are normally assembled using
either the program-counter relative or the base relative mode. The assemble directive BASE (Fig 2.5, line 13) is used in conjunction with base relative addressing
-
The main differences between Fig 2.5 (SIC/XE) and Fig 2.1 (SIC) involve
the use of register-to-register instructions (lines 150, 165). In addition, immediate addressing and indirect addressing have been used as much as possible (lines 25, 55, and 70).
-
However, the assembler can identify for the loader those parts of the object program that need modification. An object program that contains the information necessary to perform this kind of modification is called a relocatable program.
relocatable program.
-
modification record has the format
shown in P.64.
-
length field of a modification
stored in half-bytes (rather than byte)
-
starting location
field of a modification record is the location of the byte containing the leftmost bits of the address field to be modified
-
Literals
Note that a literal is identified with the prefix =, which followed by a specification of the literal value.
- 1. With immediate addressing, the operand value is assembled as part of the machine instruction.
- 2. With a literal, the assembler generates the specified value as a constant at some other memory location. The address of this generated constant is used as target address for the machine instruction.
-
literal pools
All of the literal operands used in a program are gathered together into one or more literal pools
-
LTORG
- 1. When the assembler encounters a LTORG statement, it creates a literal pool that contains all of the literal operands used since the previous LTORG (or the beginning of the program).
- 2. This literal pool is placed in the object program at the location where the LTORG directive was encountered
- 3. Of course, literals placed in a pool by LTORG will not be repeated in the pool at the end of the program.
-
user-defined symbols in assembler language programs appear
as labels on instructions or data areas. The value of such a label is the address assigned to the statement on which it appears
-
EQU
allows the programmer to define symbols and specify their value. The assembler directive generally used is EQU
-
uses of EQU
- Another common use of EQU is in defining mnemonic names for registers. For example: A EQU 0
- X EQU 1
- L EQU 2
- One common use of EQU is to establish symbolic names that can be used for improved readability in place of numeric values. +LDT +4096 → MAXLEN EQU 4096
- +LDT #MAXLEN
- When the assembler encounters the EQU statement, it enters MAXLEN into SYMTAB (with value 4096).
-
ORG
When this statement is encountered during assembly of a program, the assembler resets its location counter (LOCCTR) to the specified value. Since the values of symbols are taken from LOCCTR, the ORG statement will affect the values of all labels defined until the next ORG.
-
expressions
- Expressions are classified as either absolute expressions or relative expressions
- must be evaluated by the assembler to produce a single operand address or value
-
absolute expression
means independent of program location. A constant is an absolute term
-
A relative expression
is one in which all of the relative terms except one can be paired as described above; the remaining unpaired relative term must have a positive sign.
Example: 107 MAXLEN EQU BUFEND-BUFFER both BUFEND and BUFFER
-
Program blocks
are referred to be segments of code that are rearranged within a single object program unit, and control sections (appeared in next subsection) to be segments that are translated into independent object program units
-
USE
indicates which portions of the source program belong to the various blocks
-
program block arrangement
The assembler accomplishes this logical rearrangement of code by maintaining, during Pass 1, a separate location counter for each program block. Thus each label in the program is assigned an address that is relative to the start of the block that contains it.
-
program blocks
At the end of Pass 1,
the latest value of the location counter for each block indicates the length of that block. The assembler can then assign to each block a starting address in the object program
-
program blocks
For code generation during Pass 2
the assembler needs the address for each symbol relative to the start of the object program (not the start of an individual program block). This is easily found from the information in SYMTAB. The assembler simply adds the location of the symbol, relative to the start of its block, to the assigned block starting address
-
considerably reduced our addressing problems. Because the large buffer area is moved to the end of the object program, we no longer need to use extended format instructions on lines 15, 35, and 65
separation of the program into blocks
-
control section
is a part of the program that maintains its identity after assembly; each such control section can be loaded and relocated independently of the others. Different control sections are most often used for subroutines or other logical subdivisions of a program
-
Control sections differ from program blocks in that
they are handled separately by the assembler.
-
EXTDEF
The EXTDEF statement in a control section names symbols, called external symbols, that are defined in this control section and may be used by other sections
-
EXTREF
The EXTREF statement names symbols that are used in this control section and are defined elsewhere
-
Define record
gives information about external symbols that are defined in this control section – that is, symbols named by EXTDEF
-
A Refer record
lists symbols that are used as external reference by the control section – that is, symbols named by EXTREF.
-
-
assigns addresses to all external symbols
-
linking loader
Pass 2
erforms the actual loading, relocation, and linking.
-
ESTAB
main structure needed for the loader, the external symbol table
-
PROGADDR
program load address
is the beginning address in memory where the linked program is to be loaded. Its value is supplied to the loader by the OS
-
CSADDR
contains the starting address assigned to the control section currently being scanned by the loader. This value is added to all relative addresses within the control section to convert them to actual addresses.
-
load map
shows these symbols and their addresses. For the example of Figs 3.9 and 3.10, such a load map might look like as shown on the top of page 143
-
If more than one control section specifies a transfer address,
the loader arbitrarily uses the last one encountered.
-
If no control section contains a transfer address, .
the loader uses the beginning of the linked program (i.e., PROGADDR) as the transfer point
-
the loader uses the beginning of the linked program (i.e., PROGADDR) as the transfer point
often thought of as OS service functions
-
most loaders include
fewer different features than are found in a typical assembler. They include the use of an automatic library search process for handling external reference and some common options that can be selected at the time of loading and linking
-
Many linking loaders can
automatically incorporate routines from a subprogram library into the program being loaded
-
Linking loaders that support automatic library search must
keep track of external symbols that are referred to, but not defined, in the primary input to the loader.
-
Note that the subroutines fetched from a library in this way may themselves contain external references.
It is therefore necessary to repeat the library search process until all references are resolved.
-
loader options
option 1
- option 1: allows the selection of alternative sources of input.
- Ex.,
INCLUDE program-name (library-name)
-
loader options
option 2
- option 2: allows the user to delete external symbols or entire control sections.
- Ex.,
DELETE csect-name
- might instruct the loader to delete the named control section(s) from the set of programs being loaded.
- CHANGE name1, name2
- might cause the external symbol name1 to be changed to name2 wherever it appears in the object programs
-
loader options
option 3
- involves the automatic inclusion of library routines to satisfy external references. Ex., LIBRARY MYLIB
- Such user-specified libraries are normally searched before the standard system libraries. This allows the user to use special versions of the standard routines.
- NOCALL STDDEV, PLOT, CORREL
- To instruct the loader that these external references are to remain unresolved. This avoids the overhead of loading and linking the unneeded routines, and saves the memory space that would otherwise be required
-
Linkage editors
which perform linking prior to load time
-
dynamic linking
which the linking function is performed at execution time
-
linking loader
performs all linking and relocation operations, including automatic library search if specified, and loads the linked program directly into memory for execution
-
linkage editor
produces a linked version of the program (load module or executable image), which is written to a file or library for later execution
This means that the loading can be accomplished in one pass with no external symbol table required
performs relocation of all control sections relative to the start of the linked program. Thus, all items that need to be modified at load time have values that are relative to the start of the linked program.
-
Linkage editors can also be used to
build packages of subroutines or other control sections that are generally used together. This can be useful when dealing with subroutine libraries that support high-level programming languages.
-
linkage editors compared to linking loaders
in general tend to offer more flexibility and control
-
Linkage editors perform linking operations
the program is loaded for execution at load time
-
Dynamic linking, dynamic loading, or load on call
postpones the linking function until execution time: a subroutine is loaded and linked to the rest of the program when it is first called
avoid the necessity of loading the entire library for each execution except those necessary subroutines
-
A macro
represents a commonly used group of statements in the source programming language. The macro processor replaces each macro instruction with the corresponding group of source language statements. This is called expanding the macros
-
Macro instructions allow the programmer
write a shorthand version of a program, and leave the mechanical details to be handled by the macro processor
-
most common use of macro processors
is in assembler language programming. However, macro processors can also be used with high-level programming languages, operating system command languages, etc.
-
MACRO
tells begining of macro definition
-
The macro name and parameters
define a pattern or prototype for the macro instructions used by the programmer
-
MEND
assembler directive marks the end of the macro definition.
-
The label on the macro invocation statement (CLOOP) has been retained as a label on the first statement generated in the macro expansion
This allows the programmer to use a macro instruction in exactly the same way as an assembler language mnemonic
-
After macro processing
the expanded file (Fig 4.2) can be used as input to the assembler
-
Statements in a subroutine appear
only once, regardless of how many times the subroutine is called
-
DEFTAB
contains the macro prototype and the statements that make up the macro body (with a few modifications). Comment lines from the macro definition are not entered into DEFTAB because they will not be part of the macro expansion
-
NAMTAB
The macro names are entered into NAMTAB, which serves as an index to DEFTAB. For each macro instruction defined, NAMTAB contains pointers to the beginning and end of the definition in DEFTAB
-
ARGTAB)
used during the expansion of macro invocations
- When a macro invocation statement is recognized, the arguments are stored in ARGTAB according to their position in the argument list.
- As the macro is expanded, arguments from ARGTAB are substituted for the corresponding parameters in the macro body
-
The procedure DEFINE
which is called when the beginning of a macro definition is recognized, makes the appropriate entries in DEFTAB and NAMTAB.
maintains a counter named level. each time the macro directive is read the value of level is incremented
The above process is very much like matching left and right parentheses when scanning an arithmetic expression
-
-
is called to set up the argument values in ARGTAB and expand a macro invocation statement
-
-
which is called at several points in the algorithm, gets the next line to be processed. This line may come from DEFTAB (the next line of a macro begin expanded), or from the input file, depending on whether the Boolean variable EXPANDING is set to TRUE or FALSE
-
->
cancatenation operator for macro processors.
-
macro processor deletes
all occurrences of the concatenation operator immediately after performing parameter substitution, so ? will not appear in the macro expansion
-
unique labels
avoids dups
-
Conditional Macro Expansion
modify the sequence of statements generated for a macro expansion, depending on the arguments supplied in the macro invocation. This is called conditional macro expansion
-
SET directive in macro
This SET statement assigns the value 1 to &EORCK
-
macros
&
Note any symbol that begins with the character & and that is not a macro instruction parameter is assumed to be a macro-time variable
All such variables are initialized to a value of 0
-
macro processor must maintain
a symbol table that contains the values of all macro-time variables used. Entries in this table are made or modified when SET statements are processed. The table is used to look up the current value of a macro-time variable whenever it is required.
-
It is extremely important to understand that the testing of Boolean expressions in IF statements
occurs at the time macros are expanded.
-
macro parameter specification
keyword parameters
-
each argument value is written with a keyword that names the corresponding parameter. Arguments may appear in any order.
-
recursive calls in macros
can be done if writing macro in a language that handles recursive calls.
-
preprocessors.
another name for microprocessors
-
Macro Processing within Language Translators
The simplest method of achieving this sort of combination is a line-by-line macro processor. Using this approach, the macro processor reads the source program statements and performs all of its functions as previously described
-
grammar.
- For the purposes of compiler construction, a high-level programming language is usually described in terms of grammar. This grammar specifies the form, or syntax, of legal statements in the language.
- The problem of compilation then becomes one of matching statements written by the programmer to structures defined by the grammar, and generating the
appropriate object code for each statement
-
tokens
may be thought of as the fundamental building blocks of the language
-
lexical analysis
The task of scanning the source statement, recognizing and classifying the various tokens, is known as
-
scanner
performs lexical analysis
-
parser.
This process, called syntactic analysis or parsing, is performed by a part of the compiler that is usually called the parser
-
The last step in the basic translation process is
the generation of object code. Most compilers create machine-language programs directly instead of producing a symbolic program for later translation by an assembler
-
grammer notation
BNF (for Backus-Naur Form)
A BNF grammar consists of a set of rules, each of which defines the syntax of some construct in the programming language
-
nonterminal symbols
Character strings enclosed between the angle brackets < and > are called nonterminal symbols (such as ‘<read>’ and ‘<id-list>’). These are the names of constructs defined in the grammar
-
terminal symbols
not enclosed in brackets
-
parse tree, or syntax tree
It is often convenient to display the analysis of a source statement in terms of a grammar as a tree
-
Lexical analysis involves
scanning the program to be compiled and recognizing the tokens that make up the source statements. Scanners are usually designed to recognize keywords, operators, and identifiers, as well as integers, floating-point numbers, character strings, and other similar items
-
finite automaton
The tokens of most programming languages can be recognized by a finite automaton
-
Parsing techniques are divided into two general classes
bottom-up and top-down – according to the way in which the parse tree is constructed
-
Top-down methods parsing
recursive-descent parsing) begin with the rule of the grammar that specifies the goal of the analysis (i.e., the root of the tree), and attempt to construct the tree so that the terminal nodes match the statements being analyzed.
-
Bottom-up methods parsing
ex. operator-precedence parsing) begin with the terminal nodes of the tree (the statements being analyzed), and attempt to combine these into successively higher-level nodes until the root is reached
-
Operator-Precedence Parsing
This method is based on examining pairs of consecutive operators in the source program, and making decisions about which operation should be performed first.
-
shift-reduce parsing
Shift-reduce parsers make use of a stack to store tokens that have not yet been recognized in terms of the grammar.
-
recursive descent
The other parsing technique is a top-down method known as recursive descent. A recursive descent parser is made up of a procedure for each nonterminal symbol
|
|