Home DSL and AST
Post
Cancel

DSL and AST

Domain specific language

In program synthesis, choosing an appropriate program notation is crucial. A more narrow notation that only precisely captures the relevant program space increases the efficiency of program search. Domain specific language (DSL) refers to specialized languague notations that define these relevant programs with a limited set of functions. DSL is convenitent to be described with functional programming which is more consise and expressive than general purpose programming languages.

Abstract syntax tree

A parse tree represents a program syntax, it is usually described in context free gramma (CFG). The following parse tree CFG stands for the syntax a simple arithmetic language:

1
2
expr := term | term + expr
term := (expr) | term * term | N

The gramma captures syntaic information: * has a higher precedence over +,e.g., 5 + 3 * 2 is an expression that can only be decomposed into term + expr where term = 5 and expr = 3 * 2 as the gramma does not allow the term to be the addition of 2 expr, and then the expr is deemed as a term which is a multiplication of 2 other terms 3 and 2 (I think there the DSL also specifies term + expr != expr + term). The decomposition of the given expression starts by searching a pattern that fits and then decompose the expression recursively based on the pattern. However, the expression (5 + 3) * 2 is deemed as a big term, which is a multiplicaiton of 2 terms where one term is (expr).

An abstract syntax tree (AST) of the same arithmetic language ignores syntaic (e.g., precedence, parenthesis) details. AST represents code as a data structure:

1
data AST = Num Int | Plus AST AST | Times AST AST

The AST representation removes the parenthesis and the distinction between a term and an expression. While (5 + 3) * 2 and 5 + 3 * 2 are still distinguishable as they are constructed differently. E.g., (5 + 3) * 2 = Times (Plus 5 3) 2 and 5 + 3 * 2 = Plus 5 (Times 3 2).

The CFG version of the above AST is:

1
expr := N | expr + expr | expr * expr

In summary, AST is another layer of abstraction of parse tree to remove more syntaic details while still captures the same program space. However, it could be hard to find a closed data structure for arbitrary program space.

DSL vs AST (in my opinion)

  • DSL

    • a representation of program space

    • a pool of pre-defined components which can be used to construct concrete programs

    • can be represented with CFG

  • AST

    • a syntax tree which specifies whether a concrete output/expression can be produced by the program (given the input)

    • reduced representation of a concrete program

    • can be represented with CFG

This post is licensed under CC BY 4.0 by the author.