Frontend support for language extension with #lang-X #3363

nikswamy · 2024-07-19T01:07:32Z

For Pulse and other DSLs, we want tighter syntax integration. This PR provides support for writing

module A
#lang-somelang
let f x = 0
fn g () { ... }
let h x = 1

I.e., after #lang-something, the rest of the file is parsed by a custom parser registered for language "somelang". Such a parser can be an extension of the F* parser and what follows could be a mixture of F* and some custom syntax. The block can optionally end with an #end-lang, without it, the entire rest of the file is considered to be in extension syntax.

This is used in Pulse---where you can find a fully worked out example of how all this works.

Some of the main pieces:

FStar.Parser.AST.Util provides:

type extension_lang_parser = {
  parse_decls:
   (contents:string ->
    p:FStar.Compiler.Range.range ->
    either error_message (list decl))
}

val register_extension_lang_parser (extension_name:string) (parser:extension_lang_parser) : unit

This is the main parser hook for #lang extensions. The registered parser takes a string and a source range for the suffix of the buffer after the #lang pragma, and should produce a list of F* declarations (or an error)

FStar.Parser.ParseIt.mli provides a function that an extension can use to build an incremental parser that works well with VSCode. I.e. given a lexer and parser for one or more declarations, this function will incrementally parse until reaching the first error.

val parse_incremental_decls :
    (*filename*)string ->
    (*contents*)string ->
    FStar_Sedlexing.lexbuf ->
    (unit -> 'token * Lexing.position * Lexing.position) ->
    ('semantic_value -> FStar_Compiler_Range.range) ->
    ((Lexing.lexbuf -> 'token) -> Lexing.lexbuf ->
         ('semantic_value list * FStar_Sedlexing.snap option) option) ->
'semantic_value list * parse_error option

FStar.Parser.AST has two new constructs:

Unparseable : decl' which is a token to indicate a terminal parsing failure in a file, associated with a range. By including this token as a decl, we can produce a list of decls including up to the first error, suppressing the error. This is useful when launching the IDE on a file that may have syntax errors in it.
DeclToBeDesugared: A decl node which allows an extension to stash arbitrary data (e.g., its own parse tree) within the AST, for downstream processing. The node also includes callbacks to let an extension check blobs for equality (using for caching in the IDE), and to scan blobs for dependency analysis.

FStar.ToSyntax.ToSyntax: provides the following, to allow an extension to desugar DeclToBeDesugared node with the desugaring environment appropriate for that point in a file---using this ensures that an extension doesn't see out-of-order names in scope.

type extension_tosyntax_decl_t = env -> FStar.Compiler.Dyn.dyn -> lids:list lident -> Range.range -> list sigelt'
val register_extension_tosyntax (lang_name:string) (cb:extension_tosyntax_decl_t) : unit

FStar.Options provides an ide_filename which is set programmatically by the IDE mode, allowing other modules (notably extension parsers) to query the name of the file on which the IDE was launched. This is not a command line option.
FStar.Interactive.Ide: This is all mostly transparent to FStar.Interactive.IDE when using the full-buffer mode, i.e., for VSCode. However, for backwards compat with fstar-mode.el, which only sends fragments of buffers, I rigged the repl-state to keep track of which language parser is currently active and use that to parse the next chunk sent by emacs.

… node in the frontend AST, to tolerate syntax errors in the IDE

nikswamy · 2024-07-19T01:14:37Z

src/fstar/FStar.Main.fst

@@ -133,6 +133,8 @@ let go _ =
    | Success ->
        fstar_files := Some filenames;

+        load_native_tactics ();


load_native_tactics early, since now even the early phases of the parser, dependence scanning etc. can depend on a native plugin

gebner · 2024-07-19T01:33:43Z

ocaml/fstar-lib/FStar_Parser_LexFStar.ml

@@ -686,6 +696,22 @@ match %sedlex lexbuf with
   uninterpreted_blob snap name pos buffer lexbuf
 | _ -> assert false

+and use_lang_blob snap name pos buffer lexbuf =
+match %sedlex lexbuf with
+ | "#end-lang" ->


What do you think about ending an extension block with #lang-fstar (or #lang-pulse for that matter)?

I wonder if it is feasible for the extension parser/lexer to call back into the main parser. That is, if we could just add a #lang- rule to the pulse grammer?

One disadvantage of detecting the end of the extension block here is that it is unaware of the lexical syntax of the extension language (namely string literals and comments).

#lang-pulse (* We can exit a pulse block like this: #end-lang *) fn foo () requires emp returns _: string ensures emp { "This is how you end a pulse extension block: #end-lang Everything after that is F* again." }

Or an optional #end-lang-X to match a #lang-X?

The extension parser for Pulse does call back into the F* parser and lexer. So, it can, in principle, handle a nested #lang-x ... #end-lang-x block

The disadvantage you mentioned remains, but I guess it's not so bad ... you should be aware of the placement of the end-lang block.

That said, for Pulse itself, I don't see any reason to actually end the #lang-pulse block. Do you?

What I'm suggesting is that #lang-foo means "parse the rest of the file using Foo." And then the parser for Foo could have the same command, calling back to the main parser.

#lang-pulse fn foo () requires emp ensures emp { () } // not really necessary, as you point out #lang-fstar let bar = 42

Right, this works. You can even do

#lang-pulse fn stuff ... #lang-pulse fn more stuff

And it will recursively call back into the Pulse parser for the second block

I also registered an #lang-fstar parser and removed #end-lang

…, for use with #lang-X

…r use with #lang-X and fstar-mode.el

…ents in directly

… register a #lang-fstar parser; remove #end-lang markers

nikswamy · 2024-07-20T21:01:59Z

I also got an Everest green with this; it is still compatible with ``` pulse .. . ``` style extensions.

nikswamy added 10 commits July 17, 2024 23:11

changes in preparation for #lang-<ext>

0682aaf

Merge remote-tracking branch 'origin/master' into nik_use_lang

6879011

supporting for #lang-X, with some unit tests; also works for #lang-pulse

c9cba30

Merge remote-tracking branch 'origin/master' into nik_use_lang

c78bdc4

support for detecting the current file in the IDE, and an Unparseable…

6b97853

… node in the frontend AST, to tolerate syntax errors in the IDE

allow desugared blobs to be compared with a callback

b4b804a

support for syntax extension to offer their own dependence scanning

5b32f96

remove dump module in parser.driver (debugging)

a7b9123

mistakenly commented out some unit tests

1edff2f

snap

648d685

nikswamy commented Jul 19, 2024

View reviewed changes

gebner reviewed Jul 19, 2024

View reviewed changes

nikswamy added 8 commits July 19, 2024 16:42

backwards compat for fstar-mode.el use of Push operations in ide mode…

7c1bc74

…, for use with #lang-X

stashing open namespaces and module abbrevs in the repl_lang state fo…

afe6ca4

…r use with #lang-X and fstar-mode.el

Merge remote-tracking branch 'origin/master' into nik_use_lang

efad29a

remove stray debugging message

fe1e92e

exporting parser productons for decls with and without decorations

fa29b7d

simplifying desugaring of extension code, passing desugaring environm…

f84799b

…ents in directly

fix interleaving splices; fix handling of typeclass decorations

53d4a50

remove explicit passing of open namespaces to extension_lang parsers;…

3e1a61c

… register a #lang-fstar parser; remove #end-lang markers

nikswamy merged commit fa18fde into master Jul 21, 2024
3 checks passed

nikswamy deleted the nik_use_lang branch July 21, 2024 00:17

nikswamy mentioned this pull request Jul 21, 2024

Support for #lang-pulse FStarLang/pulse#168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frontend support for language extension with #lang-X #3363

Frontend support for language extension with #lang-X #3363

nikswamy commented Jul 19, 2024 •

edited

Loading

nikswamy Jul 19, 2024

gebner Jul 19, 2024

nikswamy Jul 19, 2024

gebner Jul 19, 2024

nikswamy Jul 20, 2024

nikswamy Jul 20, 2024

nikswamy commented Jul 20, 2024 •

edited

Loading

Frontend support for language extension with #lang-X #3363

Frontend support for language extension with #lang-X #3363

Conversation

nikswamy commented Jul 19, 2024 • edited Loading

nikswamy Jul 19, 2024

Choose a reason for hiding this comment

gebner Jul 19, 2024

Choose a reason for hiding this comment

nikswamy Jul 19, 2024

Choose a reason for hiding this comment

gebner Jul 19, 2024

Choose a reason for hiding this comment

nikswamy Jul 20, 2024

Choose a reason for hiding this comment

nikswamy Jul 20, 2024

Choose a reason for hiding this comment

nikswamy commented Jul 20, 2024 • edited Loading

nikswamy commented Jul 19, 2024 •

edited

Loading

nikswamy commented Jul 20, 2024 •

edited

Loading