NAME

isa — input file format for the isa utility

DESCRIPTION

The isa utility is used to generate instruction stream encoders and decoders from a textual description of a machine instruction set. This manual page documents the form of the textual description accepted by the isa utility.

Basic Concepts

A machine instruction is composed of one tokens, each kind of token having a defined width. Simple RISC-like instruction sets have instructions that use 1 or 2 tokens, typically an instruction word and an optional immediate field. More complex CISC instruction sets may use many more kinds of tokens.

Each token is made up of fields, for example, an instruction token could be made up of an opcode field, additional fields naming registers, fields containing flags, immediate values and so on. Fields may be named using the let directive, or can be unnamed. The bitslice operator ([]) can be used to denote specific portions of a token.

Non-overlapping fields are grouped together into fragments. Fragments may be composed using the “&” operator. The textual form of a fragment may be specified using the names directive.

A set of fragments that fully specifies each bit in a token is said to be ‘complete’. Only complete fragment sets can be emitted.

Input Syntax

The semicolon “;” introduces a comment. All text from the semicolon to the end of the line is ignored.

The language uses indentation to specify scope (i.e., it uses the offside rule), as in the Python and Haskell programming languages.

Operators

Composing Fragments

The “&” operator is used to join fragments, forming a larger fragment. For example, to specify a fragment that is comprised of two previously named fragments Rtop and Rbottom, use:

Rtop & Rbottom

Generators

A generator expression has the form [expr1|expr2...] and denotes a sequence of values expr1, where the additional expressions expr2 serve to define the range of values generated. Any “%”-escapes in expr1 are expanded. For example,

[R%n | n = 0..31]

generates the sequence R0, R1, ... , R31.

Numeric Ranges

The notation “..” denotes a numeric range. For example,

0..(2^16-1)

represents the numbers 0 to 65535, inclusive.

Sequences

Sequences of items are bracketed by square brackets “[” and “]”. For example,

let n = [ a b c d ]

Sequences can be given a local name using the name @ sequence syntax, for example:

bar@[ 1 2 3]

defines bar as a local name for the expression [ 1 2 3 ].

The “++” operator is used to concatenate sequences. These sequences must be of the same type.

Sequencing Tokens

The “<+>” operator separates tokens in sequence. For example, to specify an instruction that has two tokens T1 and T2 in sequence, use:

..the definition of T1.. 
<+> 
..the definition of T2..

Slices

Slices may be specified using the slice notation, namely name[highbit:lowbit], where highbit and lowbit are inclusive zero-based indices and name is the name of a token.

let Rsrc = instruction[3:0]

Sparse slices may be specified by separating slice expressions using commas, for example bit 7 and 5 of the ifield token may be specified using:

ifield[7,5]

Specifying Assembly Formats

The “<=>” infix operator is used to specify assembly language syntax and its mapping to sequences of fragments defined earlier, see the section Defining Assembly Syntax.

The “&*” operator indicates that all the named fragments in the LHS (the assembly syntax side) of the “<=>” operator should be treated as being present on the RHS. This operator allows instructions that have a simple one-to-one mapping between their assembly language definition and instruction encoding to be described succinctly. For example:

muls %Rd, %Rs <=> i[15:8] = 0b00000010 &*

Language Constructs

The input language has the following constructs:

arch string

Specifies the name of the instruction set architecture being processed.

arch myarch

cpus

Starts a block naming CPU identifiers. Specific instructions or groups of instructions may be flagged as being supported on sets of the CPUs so declared.

cpus 
  basic = [ CPU1 CPU2 ] 
  advanced = basic ++ [ CPU3 ]

token name (width)

Defines a token with name name and width width. For example, to define a 16 bit named i (short for “instruction”), and a 8 bit offset token named o, use:

token i(16)   ; a comment here 
      o(8)

let name [params] = expression

Declare name as being the equivalent of expression.

names generator-expression

Defines the textual representation for a fragment. For example,

let Rsrc = i[3:0] 
      names [ R%n | n = 0..7 ]

specifies that a value of 0 for fragment Rsrc should be shown as R0, and so on. Conversely, when assembing text, the string “R15” would be translated to a fragment value of 15.

where name [params] = expression

Like the let statement, a where statement introduces local definitions, except that the scope of these definitions is the statement preceding the where keyword. Example:

let Kimm6     = Kimm6high & Kimm6low 
    where Kimm6[5:4] = Kimm6high 
          Kimm6[3:0] = Kimm6low

with fragment-definition

Defines fragment assignments that hold for statements in the scope of the with statement. For example,

with i[15:8] = 0b00000011 
  fmulsu %Rd, %Rs  <=> i[7,3] = [1,1] &*

Defining Assembly Syntax

Assembly syntax is described using the <=> operator. The form of the operator is

assembler-text <=> fragment & fragment & ...

The RHS of the <=> operator must specify a ‘complete’ fragment set, i.e., no bits should be unspecified in any of the tokens used in the RHS. The LHS of the <=> operator consists of literal text interspersed by fragment names. Fragment names are prefixed by the ‘%’ character. These fragment names in the LHS may refer to fragment names defined earlier, or may be new names that are local to the current definition.

For example, the following definition defines an instruction with mnemonic “rjmp”.

let reloffset  = i[11:0] 
    reljmpcall = i[12] 
in 
    with i[15:13] = 0b110 
       rjmp %label <=> reljmpcall = 0 & reloffset = (label - . - 1)

In this definition, the field label is a local fragment, one that is used to compute the value of the reloffset field in the instruction. In the RHS, the reljmpcall bit is defined as being 0. The rest of the bits in the token i are specified by the enclosing with statement.

HISTORY

The isa utility is scheduled to appear in a future release from the Elftoolchain project.

AUTHORS

The isa(1) utility was written by Joseph Koshy <jkoshy@users.sourceforge.net>.

BUGS

The isa utility is currently under development. The input format documented in this manual is likely to change in the future. If you intend to use this utility, please get in touch with the project's developers at ⟨elftoolchain-developers@lists.sourceforge.net⟩.