Tuesday, May 4, 2010

Generating Syntax/Railroad diagrams from Xtext

Problem
Some months ago we started to speak with Patrik about syntax (railroad) diagrams for Sculptor. Syntax of Sculptor DSL is very rich now and we have no formal reference documentation. In Advanced and Developer doc many details of language are described but pure syntax description would be very helpful.

Solution
I started looking for some tools. I found following generator for syntax diagrams:

Little bit old but output looks good. Now we need only BNF definition for our language. I looked at Xtext definition and start to manually change it. For example this Xtext:
DslValueObject :
(doc=STRING)?
(abstract?="abstract")? "ValueObject" name=ID ("extends" (("@"extends=[DslValueObject]) | (extendsName=DslJavaIdentifier)))? "{"
("package" "=" package=DslJavaIdentifier )?
(((notOptimisticLocking?=NOT "optimisticLocking") | ("optimisticLocking")) |
((notImmutable?=NOT "immutable") | ("immutable")) |
((cache?="cache") | (NOT "cache")) |
((gapClass?="gap") | (noGapClass?="nogap")) |
(scaffold?="scaffold") |
("hint" "=" hint=STRING) |
("databaseTable" "=" databaseTable=STRING) |
("discriminatorValue" "=" discriminatorValue=STRING) |
("discriminatorColumn" "=" discriminatorColumn=STRING) |
("discriminatorType" "=" discriminatorType=DslDiscriminatorType) |
("discriminatorLength" "=" discriminatorLength=STRING) |
("inheritanceType" "=" inheritanceType=DslInheritanceType) |
("validate" "=" validate=STRING) |
((notPersistent?=NOT "persistent") | ("persistent")))*
((attributes+=DslAttribute) |
(references+=DslReference))*
(repository=DslRepository)?
"}";


have to be transformed to this BNF notation:
DslValueObject :
STRING?
"abstract"? "ValueObject" ID ("extends" (("@"DslValueObject) | DslJavaIdentifier))? "{"
("package" "=" DslJavaIdentifier )?
((NOT? "optimisticLocking") |
(NOT? "immutable") |
(NOT? "cache") |
("gap" | "nogap") |
"scaffold" |
("hint" "=" STRING) |
("databaseTable" "=" STRING) |
("discriminatorValue" "=" STRING) |
("discriminatorColumn" "=" STRING) |
("discriminatorType" "=" DslDiscriminatorType) |
("discriminatorLength" "=" STRING) |
("inheritanceType" "=" DslInheritanceType) |
("validate" "=" STRING) |
(NOT? "persistent"))*
(DslAttribute |
DslReference)*
DslRepository?
"}";
As you can see, it is not that different. In first round I did it manually but I found that it will be possible automatize with short sed script. Store following to convertToBNF.sed:

s/[A-Za-z]\+.=//g
s/\[//g
s/\]//g
s/""/" "/g
s/(\([A-Za-z"]\+\))/\1/g
s/^enum *//
s/^terminal *//
/^grammar /d
/^generate /d

You can apply it to Xtext file by:
sed -f convertToBNF.sed Sculptordsl.xtext > Sculptordsl.bnf

Now you can copy and paste content of Sculptordsl.bnf to text area on previous mentioned site:

For making it nice we will use bigger font and than resize to 50% with anti-aliasing. Set font size to 22 and image size to double of expected result image. For me with our example it's width 2000 and height 2000. For big grammars more images are generated on page. Just save whole page to some directory. All new browsers always store html file and all other resources like images, styles, javascripts to separate directory. Go to this resource directory and run following command (this need ImageMagic convert tool):
for i in *.png; do convert $i -trim -resize 52% ${i%.png}-S.png; done

Now you have small version of original syntax diagrams with nice anti-aliasing. Here is my result:


Conclusion
As you can see, generating nice looking syntax diagrams from Xtext grammar isn't that difficult. In next days we will make page with all Sculptor DSL syntax diagrams. Question is: "Can I apply it for my Xtext grammar too?". Sed tool is working with pure text. It's possible to confuse sed with strange declaration but for normal grammars it should work. It will be better to have special tool which will parse Xtext notation, maybe some transformation and simplification for nicer results and generate BNF notation directly, but this is bigger job and sed work nice with Sculptor Xtext. Here is full version of sed script which I'm using now for transforming Xtext grammar to BNF notation. You can see also some transformations at end of sed script which do beautification of BNF:

s/[A-Za-z]\+.=//g
s/\[//g
s/\]//g
s/""/" "/g
s/(\([A-Za-z"]\+\))/\1/g
s/^enum *//
s/^terminal *//
/^grammar /d
/^generate /d
$i \
ID :\
LETTER (LETTER | NUMBER)*\
\
STRING :\
"""" CHAR+ """";\
\
LETTER :\
"A-Z" | "a-z";\
\
CHAR :\
LETTER | NUMBER | " ~!@#$%&*()_+-={}[]:|\';/.,;<>?";\
\
INTEGER :\
NUMBER+;\
\
NUMBER :\
'0-9';\

# Syntactic sugars
# X | (Y X) => Y? X
s/\([A-Za-z"]\+\) *| *(\([A-Za-z"]\+\) *\1)/\2? \1/g

# X | (X Y) => X Y?
s/\([A-Za-z"]\+\) *| *(\1 *\([A-Za-z"]\+\))/\1 \2?/g

# (X Y) | Y => X? Y
s/(\([A-Za-z"]\+\) *\([A-Za-z"]\+\)) *| *\2/\1? \2/g

# (X Y) | X => X Y?
s/(\([A-Za-z"]\+\) *\([A-Za-z"]\+\)) *| *\1/\1 \2?/g

No comments:

Post a Comment