- Introduction
- Source language
- Virtual machine strategy
- Compilation strategy
- Unexplored extensions
- Playground & Implementation
- Ad: Are you interested in compiling structural subtypes?
In this cc, I'd like to present a small language based on the lambda calculus extended with stackful coroutines. I'll discuss a virtual machine designed for the language's execution, and an efficient compilation scheme.
We'll see that the compilation scheme
- naturally supports tail-call optimization
- eliminates all heap allocation of closures
- eliminates the need for indirect calls
without restriction of the language's expressiveness - which may be interesting in its own right.
This post is intended for anyone broadly interested in implementing coroutines, compilers, or virtual machines. I am not an expert in any of those, but I hope that something in this post may be useful to you. The post will assume basic familiarity with unification-based type inference and code generation schemes.
First, let you let me show you a playground exemplifying the language and its
compiler/VM. You can hover over values in the input buffer of the playground to
see inferred types. The default program comes from Cyber's playground,
computing the 20th fibonacci number, and how many times fib
was called.
Introduction
Recently, I've been studying the efficient implementation of effect handlers, which I currently believe are the most promising avenue for the future of managed effects in programming languages.
Effect handlers have a lot to do with coroutines - in fact, one way to implement effect handlers is as stackful coroutines, like OCaml does.
In this cc, I'll use the terms "coroutine", "green thread", and "fiber" all to refer to the specific idea of first-class, stackful, non-presumptive coroutines.
Specifically,
- Coroutines are segments of a
program whose execution can be paused and resumed at-will. They can be thought
of threads of execution controlled in userspace, rather than in the kernel.
- If you are familiar with goroutines, coroutines are like goroutines, except
that
- The point at which coroutines are preempted must be explicitly defined by
the programmer; in the language we'll look at below, via the
yield
keyword. - Dually, coroutines do not imply a concurrent runtime; it is up the
programmer whether to, and how to, execute coroutines in a concurrent
fashion. In the language we'll look at below, this is done the
spawn
andresume
keywords.
- The point at which coroutines are preempted must be explicitly defined by
the programmer; in the language we'll look at below, via the
- If you are familiar with goroutines, coroutines are like goroutines, except
that
- A first-class coroutine can be used like any other value in a language, unrestricted in its behavior; think of first-class functions.
- Stackful coroutines can arbitrarily pause their execution, including
within nested function calls. Stackful coroutines require a preserved call
stack while they are preserved, in contrast to stackless
coroutines.
C++ (via
co_await
/co_yield
), Rust (viaasync
/await
), and JavaScript (viaasync
/await
) all implement stackless coroutines - with the restriction that you cannot suspend in nested, non-co-routine call in those languages. The latter behavior induces the often-named function color problem for async/await. - Non-presumptive coroutines are those that cannot be resumed more than
once. In the language we'll discuss below, the
resume
keyword resumes execution of a coroutines, and returns a new handle to the coroutine. It is a runtime error to callresume
on the same handle to a coroutine more than once. - Coroutines are cooperative - unlike threads, which may be preempted
arbitrarily by an operating systems, coroutines require programmer intervention
to specify where execution should be paused and resumed (think
async
/await
in JavaScript/Python/Rust). Moreoever, coroutines do not imply a concurrent runtime, and it is up the programmer to define how to execute coroutines.- Again, contrast this to goroutines, where the runtime of goroutines is well-defined and suspension points are largely implicit.
Why am I talking about coroutines here? I've seen a few good articles discussing implementing coroutines against a machine-language target (1, 2), and recent projects like Cyber and Co. I'd like to present a small model and implementation of coroutines against a virtual machine target, which may be a nice base for launching more experiments without designing a programming language intended for general use.
Co implements coroutines as continuations - which I believe is also the most promising technique for an efficient implementation of effect handlers, but I won't expand on that here. Continuations often go hand-in-hand with closures, which are typically heap-allocated and called indirectly. To show off something interesting, we'll also have our implementation compile closures unboxed, and eliminate all indirect calls via defunctionalization.
Finally, we'll see that tail-call optimization falls out naturally with other aspects of the implementation.
Okay, let's get started!
Source language
Our source language will be the simply-typed lambda calculus extended with
tuples and fibers. Fibers will be our source-language-level representation of
coroutines, and can be used like any other value. Fibers can be spawn
ed,
yield
ed, and resume
d by the author to enable cooperative multitasking.
As an example, let's consider a program (taken from Cyber's
examples) that calculates the 20th
fibonacci number in a fiber that yields to its parent fiber on every iteration.
This usage of yield
and resume
lets us compute how many times fib
recursively calls itself. I hope you can see how you could run more complicated
tasks concurrently in this style.
ocaml
let rec fib = \n ->yield;if n < 2then nelse (fib (n - 1)) + (fib (n - 2))inlet rec exec = \state ->stat state.0| `Pending ->let fib1 = resume state.0 inexec {fib1, 0, state.2 + 1}| `Done n -> {state.0, n, state.2}inlet runFib = spawn (fib 20) inlet result = exec {runFib, 0, 0} in{result.1, result.2}
This language's syntax is very similar to OCaml's. I'll point out a few key features of its semantics:
-
let rec
defines a recursive binding; by default,let
bindings are non-recursive. -
{e1, e2, ..., en}
defines a n-ary tuple. Tuple elements are accessed by 0-based indices; for example,t.0
. -
Every program starts on a unique
main
fiber. A program terminates when themain
fiber terminates, even if other fibers have not terminatedcontrast this to non-joined kernel threads in e.g. C. It's up to the programmer to decide whether or not all child fibers should terminatewhether this is a good design design is a different question. -
spawn (fib 20)
executes the callfib 20
on a new fiber, which can yield to the parent fiber (in this case themain
fiber) arbitrarily.spawn
returns to the calling fiber a handle to the child fiber. These handles have typeFiber
. Fibers have two states - they are either in the pending state, or have terminated with a value. -
Fibers can be passed around arbitrarily, including to other fibers. Fibers are first-class values.
-
yield
suspends a fiber's execution and returns execution to the current parent fiber. Note that since fibers can be passed around arbitrarily, the parent of a fiber may change during a program's execution.yield
is a no-op on the main fiberI believe you could detectyield
s that are only reached by the main fiber with a full-program static analysis, but it's not clear to me whether a more localized analysis would be suitable.. -
resume
returns execution to a child fiber, which executes until its nextyield
point or completion. Likespawn
,resume
returns aFiber
. It is a runtime to resume the sameFiber
twice - in our example above, for example, it would not be legal to definediff- let fib1 = resume state.0 in- exec {fib1, 0, state.2 + 1}+ let _ = resume state.0 in+ exec {state.0, 0, state.2 + 1}A affine type system could enforce this requirement at compile-time, but we won't discuss that here.
Resuming a completed
Fiber
is effectively a no-op; theFiber
is immediately returned as-is. -
A
stat
expression queries the state of aFiber
, and matches on a`Pending
branch if the fiber has not yet terminated, or a`Done
branch if it has. The`Done
binds the return value of the fiber to a variable name.
The source language has four types:
- Integers
- Booleans
- Tuples composed of the language types, inductively
- Fibers composed of the language types, inductively
Virtual machine strategy
There are a lot of ways you could design a virtual machine for the language described above. Here, I'll show one design for a stack-based machine.
First off, let's define the stack part of the stack machine. Suppose that we can represent all our types as integerstuples are just blocks of integers next to each other, and I'll explain the representation of fibers in a bit; then our stack is just a dynamically-growing array of integers. For the sake of simplicity we'll have the stack deal with 64-bit integers, rather than bytes. This is wasteful, but that's okay for this discussion.
We'll allow a program to arbitrarily push integers onto the stack, pop integers off the stack, and store integers into arbitrary indices in the stack.
Representing function frames
Since a function in our source language might have local variables referenced multiple times, it'd be nice to support some idea of local variables in the runtime representation of a function call frame.
The standard approach here is to keep track of a frame pointer in the runtime, which points to the place that was the top of the stack when a function call was enteredTypically, when targeting machine code, frame pointers are stored in CPU registers.. When a function is entered, it can ask the runtime to reserve some space on the stack that it will use for storing local variables. Then, those local values can be accessed relative to an offset from the frame pointer.
Example of compiling a function
ocaml
let f = \x ->n = 1m = 2n + m
can compile to something like
f: sp-add 2 # reserve two cells on the stack, for n and m. # n will live at fp[0], i.e. at the index of the frame pointer # m will live at fp[1] push 1 store-into fp[0] # push, store n=1 push 2 store-into fp[1] # push, store m=2 push fp[0] push fp[1] add ret
Representing function calls
Before calling a function, the caller needs to
- allocate space for the callee's return value
- push on the callee's argumentsthis includes any captured values, which will be discussed later
- store the program counter of the caller (since this is global)
- store the frame pointer of the caller (since this is global, and will be modified in the new function frame)
We can have the virtual machine runtime take care of storing the program counter and frame pointer, and restoring them when a function is returned, but our compiler will need to take care of the first two line items.
In all, the procedure for calling functions is:
-
Before call entry:
-
allocate space for the return value
-
push on the callee's arguments
-
push on the caller's program counter and frame pointer
<return value> arg1 ... argn @caller_pc @caller_fp --- < stack top
-
-
During call:
-
Set the new program counter and frame pointer for the callee
<return value> arg1 ... argn @caller_pc @caller_fp --- < stack top = fp
-
-
Upon return:
-
Reset the top of the stack to the callee's frame pointer. That means the caller's frame pointer is now the top value on the stack.
<return value> arg1 ... argn @caller_pc @caller_fp --- < fp <-------\ <local args> | <other stuff> | --- < reset to --/ -
Pop and restore the caller's frame pointer and program counter.
<return value> arg1 ... argn < ----------\ @caller_pc | @caller_fp | --- < reset to -/ -
The caller deallocates the provided arguments, leaving the return value on the top of the stack.
<return value> <-------\ arg1 | ... | argn < reset to -/
-
Here's how a full function call would be represented in the VM.
Representing fibers
In this model, each fiber is identified by a unique stack. Spawning a fiber allocates a new call stack. The virtual machine needs to bookkeep
- each unique fiber's call stack
- what the current stack of fibers is, relative to the main fiber
Spawning or resuming a fiber pushes onto the stack of active fibers.
Yielding pops the current fiber off the top of the stack, leaving the yielded
fiber's representation on the parent fiber's stack, as the return value of the
matching spawn
or resume
.
Because our fibers cannot be resumed multiple times, we also need to keep track
of a dirty bit in both the runtime representation of a fiber's stack, and the
Fiber
value we return to the parent fiber. When a fiber is resumed, we check
that its dirty bit lines up with the fiber's dirty bit state; if these values
are out-of-line, we know the fiber has already been resumed and can't be resumed
again. Note that the dirty bit would not needed if the type system were to enforce that a
fiber can be resumed at most once. It also would not be needed if multi-shot
resumptions were supported; one strategy to enable that is to create a new fiber
on each resumption, and reference-count fiber stacks.
Put all together, we can represent a Fiber
that terminates with type T
as
the runtime tuple
ocaml
{ bit: int, value: T, fibidx: int, fibdirty: int }
Where bit=1
when the fiber is done, and 0 otherwise; value
is the value the
fiber terminated with, or else undefined; fibidx
is the unique reference to
the fiber's call stack; fibdirty
is the fiber's dirty bit.
The values in the runtime tuple cannot be accessed directly from our source
language. In fact, fibidx
and fibdirty
are purely implementation details.
bit
and value
can be accessed conditionally, via the stat
keyword.
Representing tuples
Speaking of stat
, it's worth noting how tuples (including Fiber
s) are
represented on the stack. The directionality is somewhat arbitrary, but here
we'll have lower-indexed elements in a tuple stored at the top of the stack. For
example, a Fiber<{int, int}>
will be stored as
fiber.fibdirty fiber.fibidx fiber.value.1 fiber.value.0 fiber.bit < stack top
This has the advantage of making stat
's job easier - once a fiber is loaded
onto the top of the stack, the pending/done bit can be immediately popped off,
and if needed, the terminating value is right on top of the stack.
Instruction set
All that described, we can construct an instruction set for our virtual machine.
ocaml
type op =| Eq| Lt| Sub| Add| Mul| Yield| Spawn of { proc : label; args_size : int; ret_size : int }| Resume of int (** size of return value *)| Push of locator| Store of int (** store-into fp[offset] *)| SpAdd of int| SpSub of int| SpRestoreFp (** restore to the frame pointer. Used for tail calls *)| Jmp of label| Jmpz of label| Jmprel1 (** jump relative to 1 + the integer on the top of the stack. *)| Call of label| Rettype basic_block = label * op listtype proc = {name : label;blocks : basic_block list;debug_frame : debug_frame;}
Instructions are organized into basic blocks, which are identified by a label and contain zero or more instructions that are executed in a linear fashion. A basic block only branches out via a jump or return at its exit. Procedures consist of a name and one or more basic blocks.
Most of the bytecode instructions are standard, though there are a few interesting ones:
- Along with taking the procedure to call,
Spawn
requires the size of the procedure's arguments. That's because the arguments will need to be copied over from the parent stack into the stack allocated for the child fiber. - Both
Spawn
andResume
require the size of the fiber's termination value, because upon yield or termination, the runtime will need to know how large to make theFiber
value representation. SpAdd
allocates space on the stack (for local arguments and return values),SpSub
deallocates space on the stack (so callers can reclaim space used for call argument), andSpRestoreFp
restores the top of the stack to the current frame pointer. We'll see that the latter instruction is useful for implementing tail call elimination; it is otherwise unnecessary, since such restoration happens by the runtime during a return.Call
requires a named procedure, and there is no support for indirect calls. This is not a joke, and there is no restiction here - we'll see below how to compileco_lc
, unrestricted in the kinds of closures you can create, with only direct calls.
Compilation strategy
Type inference
There are a few type systems co_lc
could use; here we use the type system of
the simply-typed lambda calculus, with complete type inference via unification.
Here's one implementation of co_lc
inference algorithm;
I won't go into depth about it here, but the approach is pretty standard.
A tip for inference of tuples
One thing that may be interesting, which I don't see typically covered in discussions of unification-based inference, is how to deal with the inference of tuples. The question has to do with an expression like
ocaml
let f = \t ->t.1 == 10
Clearly, t
is a tuple with an arity of at least two, where the second element
is an int. But the precise arity of the tuple, and its other elements' types,
cannot be known at this point.
One solution is to have both dense representations of tuple types (for when a tuple literal is seen), and a sparse representation for cases like the above. That is, our type definition might look like
ocaml
type ty =...| TupleDense of ty list (* list of element types *)| TupleSparse of (int * ty) list (* list of known indices' types *)...
When a TupleSparse
unifies with a TupleDense
, its respective elements (and
arities) must unify, and then the TupleSparse
must become a TupleDense
,
since that is the point of the concrete type's instantiation.
Eliminating indirect calls and heap-allocated closures via defunctionalization
I did promise no indirect function calls, and I must deliver. We'll use type-directed defunctionalization, via lambda sets, to guarantee that all calls can be compiled directlyDefunctionalization (and devirtualization) are typically done as optimizations in later stages of a compiler; however, by embedding call information in the type system, we can guarantee the optimization is applied., with conditional dispatch via a jump table rather than a function pointer.
Lambda sets are due to William Brandon, et.al., unpublished manuscript. Lambda sets are also used in Roc to compile closures without indirect calls or heap allocation.
Why does eliminating indirect calls matter?
An indirect call occurs when a machine has to load the address to call at runtime, rather than it being statically known.
Function pointers
In our source language, it's reasonable to write something like
ocaml
let caller = \f -> f 1 inlet id = \x -> x incaller id
A direct compilation of this code might produce bytecode like
id: ... caller: # suppose the argument is at fp[-3] push 1 push fp[-3] call-indirect ret main: push &id call caller ret
where the address of the function passed to caller
has to be loaded, popped,
and dereferenced at runtime in order for call-indirect
to determine what
function should ultimately be called. It's clear that in a program like the one
above, a smarter analysis could reduce the compiled code to
id: ... caller: push 1 call id ret main: call caller ret
The advantages of the latter include
- The call site is statically known, enabling further compiler analyses and
optimizations like inlining
id
incaller
. - If this were compiled to target a CPU, there would be no reliance on the CPU's branch target predictor at runtime.
The biggest downside of this approach I'm aware of is that if caller
is called
with multiple times with unique functions,
- a unique copy of
caller
must be stamped out for each unique argument (monomorphization, increasing code size) - or, direct calls can be used, but must be guarded by a jump table
Speaking of jump tables -
Conditionally-determined functions
A second case has to do with the representation of conditionally-determined functions. Let's extend our previous example to
ocaml
let caller = \f -> f 1 inlet id = \x -> x inlet mul2 = \x -> x * 2caller (if 0 == 0 then id else mul2)
Again, a direct compilation of this code might produce bytecode like
id: ... caller: # suppose the argument is at fp[-3] push 1 push fp[-3] call-indirect main: push 0 ifz jmp else then: push &id call caller ret else: push &mul call caller ret
As before, this compilation determines a function pointer to dispatch to, and
passes that into caller
. An alternative is to push down the conditional into
the body of caller
, enabling a direct call within each branch of the
conditional.
id: ... caller: push 1 # suppose the argument is at fp[-3] push fp[-3] ifz jmp call_mul_ call_id_: call id ret call_mul_: call mul ret main: push 0 call caller ret
The advantage here is that we've eliminated one source of indirection - we still need a conditional, but we've gotten rid of function pointers. This may make inline analyses more fruitful, and places less pressure on a CPU's branch target predictor.
One thing I'm not sure about is the efficacy of the jump-table-based caller
implementation relative to the function-pointer-based one, when caller
's argument is static. In
that case, the main performance consideration would be
- for the function-pointer-based
caller
, how long a dereference takes, and how well branch target prediction works on a CPU - for the jump-table-based
caller
, how long a conditional and jump takes, and how well branch prediction works on a CPU
I am not aware of any empirical studies on the efficacy of CPU's branch predictors vs their branch target predictorsThough, there is this nice analysis by Cloudflare. - please let me know if you know of any!
Eliminating heap-allocated captures
Lambda sets also enable a natural way to eliminate heap allocation of closure captures. This is pretty nice, especially in functional paradigms where closures are common.
Let's consider the program
where f
returns a function that either captures x
, x
and y
, or x
and
y
and z
(notice that f
itself must capture all three). At runtime, the
captures must live alongside the representation of the function to call - which
means that the runtime representation of the captures must also be the same
size, no matter what function is returnedBecause the return
value of f
can be used indiscriminately, including being passed around to arbitrary
locations, it must have a statically-known size..
The typical way this requirement of uniformityOf course, one alternative is to disallow the conditional selection of a closure. C++ and Rust enforce this restriction by giving each closure a unique type, which is incompatible with every other closure type. is achieved is by compiling closures to a representation like the following C struct
c
struct closure {void* fn_addr;void* captures;};
where each capturing function takes an opaque captures
pointer as an argument,
which it can then unpack to the concrete type of captures it knows.
For the function f
returns, this is needlessly generic - f
's returned closure
can have the stricter representation
c
struct f_closure {int tag; // 1, 2, or 3struct f_closure_captures captures;};union f_closure_captures {struct captures1;struct captures2;struct captures3;};struct captures1 {int x;};struct captures2 {int x; int y;};struct captures3 {int x; int y; int z;};
That is, we can keep a tag of three states telling us which function to dispatch to (this is the elimination of function pointers discussed in previous sections). And, we can reduce the representation of captures to a three-state union - which can all be stored on the stack, at the expense of paying for the largest captures size!
As before, we are making tradeoffs here. While this helps avoid heap-allocation, it means that
- The smallest captures in a representation like the above take as much space on the stack as the largest captures, which can be exceptionally wasteful if the size discrepancy is large.
- This scheme makes incremental and separate compilation more challenging, as the addition of a new function can change the memory representation of values far away.
Inference and compilation semantics of lambda sets
Actually, the f_closure
struct representation I gave above is exactly where
lambda sets take us! If you already see how to get there, feel free to skip this
section.
I must again say that the original idea of lambda sets in this type-directed form is due to William Brandon, et.al. in an unpublished manuscript. We've also used them in Roc, and learned a lot about their behavior (many potential blog posts about that!).
The idea is
-
Each syntactic function (a lambda
\x -> ...
) gets a unique name -
When inferring the type of a function, include the name of that function and its set of captures in a set adjacent to the function type; this is the lambda set. For example,
ocamlx = 1y = 1f = \n -> n + x + ymay be given type
int -[f {x: int, y: int}]-> int
, where[f {x: int, y: int}]
is the lambda set. -
When two functions unify, their lambda sets union.
Take a look at the inferred types of our example from above. Feel free to
hover over f
, or look at the elaborated output.
Notice that the inferred type is saying that f
returns a function of type int -> int
whose lambda set is [lam {x} | lam1 {x, y} | lam2 {x, y, z}]
- exactly
the three functions that we might dispatch to, and what their captures are! This
is all we need in order to compile the return value of f
to the
f_closure
-like representation described previously.
Compiling functions, including lambda sets
Put all together, our compilation scheme for functions consists of
- When compiling a function (e.g.
lam2
of the example above)- Determine the stack size of its captures based on the largest captures of
any lambda in the lambda set it's involved in.
- Compile the function to unpack the passed captures appropriately.
- For the runtime representation of the function-to-call, store the captures on the stack.
- If the lambda set of the function is non-unary, the function to call must be
determined conditionally; store a tag for this function on the top of the
stack (e.g.
2
, sincelam2
appears in index 2 of the set).
- Determine the stack size of its captures based on the largest captures of
any lambda in the lambda set it's involved in.
- When calling a function
-
Load the argument, then the function. That way, the stack consists of the argument, followed by any captures, followed by optionally the function tag on the top of the stack.
<arg> <captures> <tag if non-unary lambda set> < stack top -
If the lambda set is unary, there is only one function that can be called - compile the direct call.
-
If the lambda set is non-unary, build a jump table (one implementation) using the integer tag of the lambda on the top of the stack. Each branch of the jump table can perform the appropriate direct call to the matched function.
-
Tip: deciding where to store values
I don't know if this is well-known, but one thing I find helpful during compilation is to have the procedure that compiles an expression take a parameter indicating where the expression should be compiled, rather than (in this case) always compiling to the stack and storing the result elsewhere later. This eliminates a lot of trivially-reducable load and stores.
For example, my implementation has the recursive bytecode compiler take an optional target destination, which looks like
ocaml
type opt_target =[ `Any| `FpOffset of int (** store as a local offset from the frame pointer *)| `Stack (** store on the top of the stack *) ]
Sometimes, the target truly can be arbitrary. For example, when compiling a tuple access, we don't really care if the tuple is loaded onto the top of stack or is in an offset from the frame pointer - in the former case we can pop values off until we get to the field we're looking for, and in the latter case, the access can be made relative to where the tuple lives.
To facilitate the arbitrary-target case, the procedure also returns where
exactly the expression ended up being stored, as a type that's opt_target
without the Any
variant.
Tail-call optimization
Passing an optional target makes tail-call optimization very natural - when
we're compiling a function, we set up the target of the function body to a
special return
local, which is the location where the caller allocated space
for the return value.
During the compilation of a function, if we see a call to the same function, and
whose target is the return
local, we know that we can perform a tail-call
optimization! Thanks to lambda sets, the destination of a function call is in the function
type, so only the type of the function being applied needs to be examined to
understand whether the optimization can be made.
To actually perform the optimization, compile the call argument into the
location of the argument in the current function frame, reset the stack pointer
to the frame pointer, and jump back up to the top of the function. See, I
promised a use of SpRestoreFp
!
Unexplored extensions
Thanks for making it this far! You must be pretty interested in this kind of stuff. There are various natural extensions to this project, which may make for interesting projects:
- Make these fibers do something interesting. Our language and compiler is enough to show off that the implementation of fibers works, but doesn't actually demonstrate its use in an interesting one. One idea would be to add support for making non-blocking network requests and show an example of cooperative multitasking where multiple fibers fetch many webpages.
- Implement an affine type system to enforce the one-shot behavior of our fibers at compile-time.
- Design and implement an analysis to detect what spawned expressions will never yield. Those expressions can be optimized to happen as direct evaluations, rather than needing to happen on a child fiber.
- Design an efficient scheme for multi-resumable fibers.
- Design a system for implicitly-yielding userland threads, and a preempting runtime scheduler of userland threads. Modify the VM and compiler to support this system.
- Implement additional optimizations over the virtual machine, and optimizations of the bytecode in the compiler.
- My virtual machine is implemented in OCaml (and compiled to JavaScript via JSOO). Using a tighter language, like C/Rust/Zig/whatever, for the VM (and compiling to WASM) may result in a much faster machine.
Playground & Implementation
I hope you enjoyed this post, and learned something. You can find my implementation of the language compiler and VM on GitHub. As you saw previously in this post, I've also made a playground for the language, embdded again below.
Ad: Are you interested in compiling structural subtypes?
If you know of ways, or are interested in, compiling languages based on structural subtyping to non-uniform representations, please email me! This has been an ongoing project of mine and I hope to write about it soon, and would love to chat through with others interested.
In particular, the idea here is to combine
- the programming flexibility of structural subtyping
- (non-principal, possibly incomplete) type inference, a-la MLsub
- row polymorphism and polymorphic variants
to design subtyping-based source languages with compilation to target languages that
- do not support runtime object polymorphism (i.e. all procedures are monomorphized)
- have non-uniform, unboxed representations
- do not require introducing implicit conversions at re-binding or usage sites