i'm trying create documentation generator several languages. need ast, in order known that, instance, comment class , 1 method of class.
i started write simple python code display tree recursively looking on it:
import sys import antlr4 ecmascriptlexer import ecmascriptlexer ecmascriptparser import ecmascriptparser def handletree(tree, lvl=0): child in tree.getchildren(): if isinstance(child, antlr4.tree.tree.terminalnode): print(lvl*'│ ' + '└─', child) else: handletree(child, lvl+1) input = antlr4.filestream(sys.argv[1]) lexer = ecmascriptlexer(input) stream = antlr4.commontokenstream(lexer) parser = ecmascriptparser(stream) tree = parser.program() handletree(tree)
and tried parse javascript code, antlr ecmascript grammar:
var = 52; // inline comment function foo() { /** foo documentation */ console.log('hey'); }
this outputs:
│ │ │ │ └─ var │ │ │ │ │ │ └─ │ │ │ │ │ │ │ └─ = │ │ │ │ │ │ │ │ │ │ └─ 52 │ │ │ │ │ └─ ; │ │ │ └─ function │ │ │ └─ foo │ │ │ └─ ( │ │ │ └─ ) │ │ │ └─ { │ │ │ │ │ │ │ │ │ │ │ │ └─ console │ │ │ │ │ │ │ │ │ │ │ └─ . │ │ │ │ │ │ │ │ │ │ │ │ └─ log │ │ │ │ │ │ │ │ │ │ │ └─ ( │ │ │ │ │ │ │ │ │ │ │ │ │ │ └─ 'hey' │ │ │ │ │ │ │ │ │ │ │ └─ ) │ │ │ │ │ │ │ │ │ └─ ; │ │ │ └─ } └─ <eof>
all comments ignored, because of presence of channel(hidden)
in grammar.
after googling found this answer:
unless have compelling reason put comment inside parser (which i'd hear), should put in lexer.
so, why comments should not included in parser , how tree including comments?
so, why comments should not included in parser , how tree including comments?
if remove -> channel(hidden)
rule multilinecomment
multilinecomment : '/*' .*? '*/' -> channel(hidden) ;
then multilinecomment
end in parser. then, each of parser rules need include these tokens allowed.
for example, take arrayliteral
parser rule:
/// arrayliteral : /// [ elision? ] /// [ elementlist ] /// [ elementlist , elision? ] arrayliteral : '[' elementlist? ','? elision? ']' ;
since valid array literal in javascript:
[/* ... */ 1, 2 /* ... */ , 3 /* ... */ /* ... */]
it mean you'd need litter parser rules multilinecomment
tokens this:
/// arrayliteral : /// [ elision? ] /// [ elementlist ] /// [ elementlist , elision? ] arrayliteral : '[' multilinecomment* elementlist? multilinecomment* ','? multilinecomment* elision? multilinecomment* ']' ;
it become 1 big mess.
edit
from comments:
so it's not possible generate tree including comments antlr? there hacks or other libraries this?
and grosenberg's answer:
antlr provides convenience method task:
bufferedtokenstream#gethiddentokenstoleft
. in walking parse tree, access stream obtain node associated comment, if any. usebufferedtokenstream#gethiddentokenstoright
trailing comment.
Comments
Post a Comment