rachel.cafe/_posts/2021-12-10-secd1.md

10 KiB
Raw Blame History

The SECD machine is a virtual machine designed to evaluate lambda expressions. Its purpose is to be a more natural goal for compilers than assembly while maintaining reasonable execution speed. In this blog post you will get a quick intro to SECD machines and an overview of a simple implementation.

{% katexmm %}

The SECD Machine

SECD stands for Stack Environment Control Dump, all of which but the environment are stacks in the SECD machine. The machine operates by reading instructions from the control stack which operate on itself and the other stacks. A lambda is implemented as its body coupled with its environment. Application is done with the “apply” instruction which pops a lambda off the stack and adds the next element of the stack to the lambdas environment environment, binding the variable of the lambda. The previous stacks are then pushed onto the dump stack. When the lambda has been fully reduced the return instruction is used to save the top of the stack, the result of the reduction, and restores the stacks from the dump stack.

The Modern SECD Machine

This approach, while sound, suffers from some performance issues. Thankfully, there are a number of improvements which can be made. The first of which is the use of De Brujin indecies. In De Brujin index notation each variable is a number indexing which lambda it was bound at (counting outwards). Here are some examples:

\lambda x.x \xRightarrow[]{\text{De Brujin}} \lambda \#0

\lambda f.\lambda g.\lambda x. f(gx) \xRightarrow[]{\text{De Brujin}} \lambda\lambda\lambda \#2(\#1\#0)

The benefit in using De Brujin notation is that, rather than a balanced binary tree, a stack may be used for the environment. A variable is then an index into the environment. An additional benifit is that the machine does not need to concern itself with alpha equivalence. As a side-note, counting conventionally starts at \#1, not \#0, however in a virtual machine counting from \#0 is, of course, much more natural.

The second improvement to be made is to “dump” the dump stack. Rather than using the dump stack the current state will be pushed onto the stack as a closure which the return instruction can then restore when exiting a lambda.

The third improvement is to introduce a new, static “stack”. The rationale for this is that many programs will include top level definitions which are in scope throughout the program. Including these definitions in the environment whenever a closure is created is a waste of both memory and processing power.

The fourth improvement is allowing multi-variable lambdas. In the case of the SECD machine using a multi-variable lambda rather than multiple single variable lambdas corresponds to not nesting closures, which would lead to redundant copies of the same environment being made. Note that this does not sacrifice currying, as will become clear in later sections.

An implementation of a simple SECD machine

In this section a SECD machine implemented in Haskell will be introduced bit by bit. This machine is by no means an efficient nor powerful implementation, it serves only to demonstrate the core concepts of the SECD machine.

Data structures

The machine will only be able to recognise two types of values. Integers and Closures.

data Val
  = I Int
  | Clo Int [Val] [Inst] -- number of arguments, environment, instructions
  deriving (Eq, Show)

The machine comes with the following instruction set.

data Inst
  = Const   Int
  | Global  Int
  | Local   Int
  | Closure Int Int -- number of args, number of instructions
  | App
  | Ret
  | Dup
  | Add
  deriving (Eq, Show)

The instructions

  • Const pushes a constant onto the stack.

  • Global pushes a global variable onto the stack.

  • Local pushes a local variable onto the stack.

  • Closure creates a closure taking a set number of arguments and consuming a set amount of instructions from the control stack. The resulting closure is pushed onto the stack.

  • App pops a closure and an argument from the stack. If the closure takes more than one argument the argument is pushed onto the closures environment and the closure is pushed back onto the stack. If the closure takes only one argument the closure will, rather than being pushed onto the stack, replace the current environment, such that the closures instructions are placed in the control stack, and its environment placed in the environment stack. A closure is then formed from the old control and environment stacks, which is pushed onto the stack.

  • Ret pops a closure in the second index of the stack and installs it as the current control and environment stacks. The element at the top of the stack remains untouched, yielding the result of an application.

  • Dup duplicates the element at the top of the stack. Its only included as a matter of convenience.

  • Add pops the top two elements off the stack, adds them, and pushes the result back onto the stack.

Instruction Table

All stacks grow from right to left, that is, the left most element is at the top of the stack.

Before After
Control Env Stack Control Env Stack
Const i : c e s c e i : s
Global i : c e s c e Globals[i] : s
Local i : c e s c e e[i] : s
Closure n a : c_1c_n : c e s c e Closure a {e} [c_1c_n] : s
App : c e Closure {e} [c] : v : s c v : e Closure^0 {e} [c] : s
App : c e Closure^n {e} [c] : v : s c e Closure^{n - 1} {v : e} [c] : s
Ret : c e v : Closure^0 {e} [c] : s c e v : s
Dup : c e v : s c e v : v : s
Add : c e v : s c e v + v : s

An example

Consider the following lambda expression:

(\lambda f.\lambda g.\lambda x. f(gx)) (\lambda x. x + 1) (\lambda x. x + 1) 0

It composes a lambda which increments its input by one with an identical lambda and then applies 0 to the result. In order to run this on our SECD machine it must first be rewritten with De Brujin index notation.

(\lambda \lambda \lambda \#2(\#1\#0)) (\lambda 1 + \#1) (\lambda 1 + \#1)0

Now in order to create an instruction tape for the SECD machine it must be rewritten, using Closure as lambdas, Local \#i for variables and keeping in mind to explicitly apply to and return from lambdas. Additionally it will have to be written in postfix form due to the nature of stack based machines. After performing these actions the expression has been transformed into the following tape:

Const 0 : Closure [ Const 1 : Local \#0 : Add : Ret ] : Closure [ Const 1 : Local \#0 : Add : Ret ] : Closure^3 [ Local \#0 : Local \#1 : App : Local \#2 : App : Ret ] : App : App : App

Reading this (or one of the previous lambda expressions) backwards you should be able to convince yourself of the validity of this translation.

This can now directly be translated into a list of instructions for the Haskell machine implementation.

[ Const 0, Closure 1 4, Const 1, Local 0, Add, Ret, Dup
, Closure 3 6, Local 0, Local 1, App, Local 2, App, Ret
, App, App, App]

Which can then be evaluated in order to yield the normalized expression.

λ eval [] [ Const 0, Closure 1 4, Const 1, Local 0, Add, Ret, Dup
          , Closure 3 6, Local 0, Local 1, App, Local 2, App, Ret
          , App, App, App]

[I 2]

As expected, the expression reduces to 2.

Whats next?

In the future I plan to create a specification for an instruction set which allows for meaningful computations as well as IO. Then I wish to define an instruction encoding and binary object file format. Once all this is done an efficient C implementation could be written. Ideally I would also write a compiler to said binary format from something a bit user friendlier. Perhaps an untyped, or simply typed lambda calculus.

The Haskell source

The implementation itself is trivial and is little more than a Haskell translation of the instruction table above. Its ugly, it doesnt do error handling and is little more than a demonstration piece.

module SECD where

data Inst
  = Const   Int
  | Global  Int
  | Local   Int
  | Closure Int Int -- args, scope
  | App
  | Ret
  | Dup
  | Add
  deriving (Eq, Show)

data Val
  = I Int
  | Clo Int [Val] [Inst] -- args env code
  deriving (Eq, Show)

vadd :: Val -> Val -> Val
vadd (I x1) (I x2) = I (x1 + x2)
vadd _ _           = error "attempted addition with closure"

data SEC = SEC [Val] [Val] [Inst] [Val] -- stack, environment, control, global
  deriving Show

-- | Takes a global env, a list of instructions and returns and int
eval :: [Val] -> [Inst] -> [Val]
eval globals insts = go (SEC [] [] insts globals)
  where
    go :: SEC -> [Val]
    go sec@(SEC stack env control global) = case control of
        [] -> stack
        (x:xs) -> case x of
          Const i     -> go (SEC (I i:stack) env xs global)
          Global i    -> go (SEC (global !! i:stack) env xs global)
          Local  i    -> go (SEC (env    !! i:stack) env xs global)
          Closure a s -> go (SEC (Clo a env (take s xs):stack) env (drop s xs) global)
          Ret         -> let (v:(Clo 0 e c):st) = stack in go (SEC (v:st) e c global)
          App         -> case stack of
                          (Clo 1 e c:v:st) -> go (SEC (Clo 0 env xs:st) (v:e) c global)
                          (Clo n e c:v:st) -> go (SEC (Clo (n-1) (v:e) c:st) env xs global)

          Dup         -> let (v:st) = stack in     go (SEC (v:v:st) env xs global)
          Add         -> let (x1:x2:st) = stack in go (SEC (vadd x1 x2:st) env xs global)

{% endkatexmm %}

Licensing Information

Feel free to use any of the code or concepts here in your own projects! The code can be considered free of copyright and all notions of liability and/or warranty.

See Also

Modern SECD Machine, University of Calgary - Gave me the idea to scrap the D stack.

SECD machine, Wikipedia