Language definition from EBNF

Hi guys,

I'm looking to write a DSL for a language for whose syntax I already have an EBNF.  Since BNFs of various kinds are such a common, compact way of defining the syntax of a language, it seemed that implementing such a thing in MPS should be a well-trodden path. But I haven't found much on the subject in reading through your documentation and watching a variety of your videos.  Can you point me to any resources on this, or describe any useful, high-level patterns?

Of course, the non-terminals can be mapped to concepts, and the terminals to enumerable data types.  But how, for example, does one define a non-terminal as a (typically recursive) sequence of non-terminals?  I can imagine cobbling something together with constraints and type-checking, but is there a cleaner mapping from an EBNF grammar specification to an MPS grammar spec?

Thanks in advance for any guidance you might be able to provide.

Best,

Steve

9 comments

Hi Stephen,

Ever since starting with MPS I have been looking at ways to bootstrap new languages into MPS.
One (hobby) project I started (metabnf) together with my colleagues can be found in our repository
on DSLFoundry (github.com/DSLFoundry/mps-metabnf).

The goal of this project is to transform (augmented) EBNF into an MPS language. From the EBNF notation
we derive:
- a concept hierarchy
- editors
- text generators

Indeed as you mention, a non-terminal would be mapped onto a concept. Terminals are typically mapped to
constrained datatypes (with special support for ids to generate INamedConcepts). Any terminal occurring
in the right-hand-side of your production rule will be a property while any non-terminal in the
right-hand-side will be a child (or sequence of children). Alternatives are utilized to generate interfaces
and for extending concepts (at times intermediate concepts may need to be generated for this).

As for recursive sequences, I typically rewrite these to utilize the one-or-more, zero-or-more and zero-or-one
extensions that EBNF adds to BNF. It takes a bit of massaging the grammar, but that makes it way easier to
generate a neat concept hierarchy.

There is however an issue with generating a concept hierarchy like this. EBNF grammars are typically constructed
for parsers, not for interaction. Hence, the resulting concept-hierarchy does not take usability into account.
Language grammars are specified to eliminate ambiguous rules, not to have sane completion in MPS.

I usually address these issues by, again, massaging the grammar into a better shape and by adding usability
extensions to metabnf.

If you want to take a look at it, it is implemented for MPS 3.1 and has been on hold for a while (other priorities).
The project is certainly not dead, but it really needs some more attention.

Regards,
- Remi Bosman

0

Hi Remi,

Thank you very much for your detailed and timely response.  

Metabnf is exactly the sort of thing I had in mind.  I'll need to study it.  (After I figure out how to open it as a sample language in MPS 3.3!)

You mention mapping non-terminals on the right-hand side of production rules to children, or to sequences of children.  How does that work for recursive rules, where the non-terminal on the right is the same as the non-terminal being defined?  In MPS, can a concept be a child of itself?  

(I'm a bit confused about what exactly a "child" is in MPS.  In the examples, it seems that children are sometimes instances of their parent, and sometimes constituents of their parent.  An example of the latter would be a conditional node, whose antecedent would be a child of the conditional.  Of course, there's nothing inherently wrong with making an AST link mean different things at different times, but it might be a little better explained.)

Also, you mention rewriting recursive sequences with "*", "+", etc.  Is the point here just to support a direct conversion to MPS' cardinality constraints, or are there other benefits?

I take your point that BNFs are designed for purposes other than the usability of the resulting concept hierarchy. So a certain amount of massaging will always be necessary.  Still, one would think a utility like metabnf would be a really useful addition to MPS, since, as I say, BNFs are such a common way of specifying the syntax for a language.  (As witness all the parser generators (Antlr, etc.) that let you build a parser for a language from a BNF for the language.)

Once again, thanks so much for your informative response.

Best,

Steve  

0

Hello Steve,

> Metabnf is exactly the sort of thing I had in mind. I'll need to study it.
> (After I figure out how to open it as a sample language in MPS 3.3!)

Heh, indeed. Sorry about that. We really need to migrate that project to a later MPS version.
I'll try to find some time, but it will probably be some weeks.

In MPS concepts can declare to have children of the same type as the concept itself.
So for recursive rules that would lead to a concept with children of the same type.

The reason I used "*" and "+" sequences is that it helps keeping the reduction rules simple
(less case analysis etc...). Furthermore, using explicit sequences puts more semantic knowledge
into the grammar which means it becomes easier to generate a language with good usability.

If I recall correctly, then automatic recognition of recursive rules is not
implemented yet, but it could be realised as a refactoring at the grammar level.
A recursive rule generates a different model (i.e. with recursively nested nodes)
than a concept using the cardinalities (which is cleaner in my opinion).

Naturally, I agree on the usefulness of a language like metabnf (once it becomes production ready).
While initial usability might not be optimal, the long-term idea is to capture common
usability patterns as extensions to the metabnf grammar definition itself.

Some examples.

For the sake of defining a concept hierarchy the concrete syntax does not really matter
(only the abstract syntax). However, for editors and text generators they do play a role.
In case a production rule starts with a keyword, it immediately becomes a candidate for
an alias, thus improving usability.

Another example, consider e.g. a Java Method:
public static void Foo () {...}

Here the keyword 'static' is optional. Metabnf supports the concept of a 'Flag' which is a
special type of toggleable keyword. By marking static as a flag, we can have boolean
attributes to toggle it on/off and (not implemented yet) generate intentions to make it
static/non-static.

It would be a great extension to automatically generate all left/right transformations for
expressions. Again, by extending BNF with special concepts we can eventually support this.

Last, but not least, the grammar could be utilized to generate a text parser such that we
can also copy/paste plain text and import textual files.

The typical intended usage scenario would be to copy/paste a grammar into MPS.
Then modify it a bit such as adding proper (significant!) indentation, replacing recursive
rules with sequences, and so forth. Then generate an MPS language from that.

Regards,
-Remi.

0

Hi Remi,

That migration procedure looks forbidding!

But I think I'm beginning to understand.  Simplifying the Expression sample a bit, part of it might be represented in BNF form as:

1) SimpleExpression: UnarySimpleExpression | BinarySimpleExpression
(2) UnarySimpleExpression: OneOperator SimpleExpression
(3) OneOperator: "not"
(4) BinarySimpleExpression: SimpleExpression TwoOperator SimpleExpression
(5) TwoOperator: "and" | "or"

Then, in MPS:
(1') UnarySimpleExpression and BinarySimpleExpression extend (but are not children of) SimpleExpression
(2') UnarySimpleExpression gets one child, a SimpleExpression
(3') OneOperator becomes an alias of UnarySimpleExpression, to be used in the editor
(4') BinarySimpleExpression gets two children, both of type SimpleExpression
(5') TwoOperator becomes an alias of BinarySimpleExpression, to be used in the editor

The two terminals become aliases. Maybe that's because they're enum data types; in the more general case, where they're constrained data types, maybe they should become properties, as you suggest in your original response?

In any case, thanks so very much for taking the time to respond.  It's really been very helpful.

Best,

Steve

0

Migration from 3.3 seems to fail, but MPS 3.2 can migrate it to a newer version.
Some code has changed under the hood and therefore metabnf needs to be adapted.
So the migration path looks like: 3.1 -auto-> 3.2 --manual fix--> 3.2 -auto-> 3.3

Yes, your description of the mapping looks about right.

Though it is not the preferred AST. Better to have the 'and' expression and the 'or' expression extend BinarySimpleExpression. That way they can have their respective aliasses set properly can each have their own reduction rules. Likewise, you would prefer the UnarySimpleExpression to have the alias 'not' instead of 'OneOperator'.

E.g.
Expr ::= UnaryExpr | BinaryExpr
UnaryExpr ::= NotExpression | ...
BinaryExpr ::= AndExpr | OrExpr | ...

NotExpression ::= "not" SimpleExpression
AndExpression ::= Expr 'and' Expr
OrExpression ::= Expr 'or' Expr

My intended extension for metabnf is to allow this:

BinaryExpr ::= Expr BinaryOp Expr

BinaryOp ::= 
alias | description         | level
------------------------------------------
&&   | conditional AND |     12
||      | conditional OR   |     13

For that, I need to add a new production rule which generates expression hierarchies from operator tables.

I need to continue this project, it is just way too much fun to neglect :) 

0

That extension to metabnf would be pretty cool!

To recap your advice on how to get a set of MPS concept definitions from a BNF:

(1) Start with your initial "raw" BNF, e.g.:

Expr ::= ... | 'not' Expr | Expr 'and' Expr | Expr 'or' Expr ...

(2) Expand this by adding auxiliary, intermediate concepts.  Their purpose is: (a) to avoid direct recursion; (b) to support distinctions that will be necessary elsewhere (e.g., in reduction rules); and (c) to reduce the step-size, as it were, between concepts, in order to support a smoother, more user-friendly dsl.  

(It would seem that, of these three, only (a) is straight-forwardly mechanizable.  (b) depends at least partly on the purposes to which the dsl will be put, and (c) depends on the initial "step-size" between non-terminals, the abilities of those who will use the dsl, etc.)

In any case, having figured that out we'd augment our raw BNF.  Here, we might add: UnaryExpr, BinaryExpr, NotExpr, AndExpr and OrExpr.  Per your example, we'd get:

Expr ::= UnaryExpr | BinaryExpr
UnaryExpr ::= NotExpr | ...
BinaryExpr ::= AndExpr | OrExpr | ...

NotExpr ::= 'not' Expr
AndExpr ::= Expr 'and' Expr
OrExpr ::= Expr 'or' Expr

(3) From this we can generate MPS concept definitions more-or-less mechanically.  The non-terminals on the right side become extensions of the concept on the left, except for (indirect) recursive references, which become children of the concept on the left.

I'm unclear about terminal symbols.  In this simple example they become aliases.  When there are more than one, I suppose they might become names of children?  For example, in the rule:

IfExpr ::= 'if' Expr 'then' Expr

Could "if" become the name of the antecedent child Expr?  At any rate, in the default case they become properties of the concept on the left.

If we stipulate -- as the lawyers say -- that this is just a rough summary of the procedure you've described, is it at least roughly accurate?  Have I left out anything important?

Thanks again for your guidance.

Best,

Steve 

0

When we look at terminals we need to make a distinction between those with a constant value (i.e. keywords) and those with variable values adhering to some regular expression such as typically an identifier.

Keywords are not really all that interesting. They are concrete syntax which is needed for parsing and readability by humans. They do not really exist in the AST (the Flags I described blur these lines a bit, but I hope you get the idea). In other words, the 'if' and 'then' in a selection statement are just decoration, pretty printing so to speak. Do not store them in structure, output them in textgen aspect and add them as constants in the editor aspect.

So no, keywords would typically not become properties of the concept.Other than that your summary seems accurate.

As for aliases, they are mostly a usability thing. The first keyword in a rule is just a good guess for many constructs in languages. You can do without altogether, but that just makes typing more cumbersome.

 

0

I see.  Thanks again for all your help, Remi.

Best,

Steve

0

Any updates on this project? I see the last update was for MPS 3.3. Any plans to port to newest version of MPS? I think it would also help with this topic making it easier to create MPS language targets for common languages like C, GO, Python, Perl, C++, Kotlin, etc..

0

Please sign in to leave a comment.