Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frontend support for language extension with #lang-X #3363

Merged
merged 18 commits into from
Jul 21, 2024
Merged

Conversation

nikswamy
Copy link
Collaborator

@nikswamy nikswamy commented Jul 19, 2024

For Pulse and other DSLs, we want tighter syntax integration. This PR provides support for writing

module A
#lang-somelang
let f x = 0
fn g () { ... }
let h x = 1

I.e., after #lang-something, the rest of the file is parsed by a custom parser registered for language "somelang". Such a parser can be an extension of the F* parser and what follows could be a mixture of F* and some custom syntax. The block can optionally end with an #end-lang, without it, the entire rest of the file is considered to be in extension syntax.

This is used in Pulse---where you can find a fully worked out example of how all this works.

Some of the main pieces:

  1. FStar.Parser.AST.Util provides:
type extension_lang_parser = {
  parse_decls:
   (contents:string ->
    p:FStar.Compiler.Range.range ->
    either error_message (list decl))
}

val register_extension_lang_parser (extension_name:string) (parser:extension_lang_parser) : unit

This is the main parser hook for #lang extensions. The registered parser takes a string and a source range for the suffix of the buffer after the #lang pragma, and should produce a list of F* declarations (or an error)

  1. FStar.Parser.ParseIt.mli provides a function that an extension can use to build an incremental parser that works well with VSCode. I.e. given a lexer and parser for one or more declarations, this function will incrementally parse until reaching the first error.
val parse_incremental_decls :
    (*filename*)string ->
    (*contents*)string ->
    FStar_Sedlexing.lexbuf ->
    (unit -> 'token * Lexing.position * Lexing.position) ->
    ('semantic_value -> FStar_Compiler_Range.range) ->
    ((Lexing.lexbuf -> 'token) -> Lexing.lexbuf ->
         ('semantic_value list * FStar_Sedlexing.snap option) option) ->
'semantic_value list * parse_error option
  1. FStar.Parser.AST has two new constructs:
  • Unparseable : decl' which is a token to indicate a terminal parsing failure in a file, associated with a range. By including this token as a decl, we can produce a list of decls including up to the first error, suppressing the error. This is useful when launching the IDE on a file that may have syntax errors in it.

  • DeclToBeDesugared: A decl node which allows an extension to stash arbitrary data (e.g., its own parse tree) within the AST, for downstream processing. The node also includes callbacks to let an extension check blobs for equality (using for caching in the IDE), and to scan blobs for dependency analysis.

  1. FStar.ToSyntax.ToSyntax: provides the following, to allow an extension to desugar DeclToBeDesugared node with the desugaring environment appropriate for that point in a file---using this ensures that an extension doesn't see out-of-order names in scope.
type extension_tosyntax_decl_t = env -> FStar.Compiler.Dyn.dyn -> lids:list lident -> Range.range -> list sigelt'
val register_extension_tosyntax (lang_name:string) (cb:extension_tosyntax_decl_t) : unit
  1. FStar.Options provides an ide_filename which is set programmatically by the IDE mode, allowing other modules (notably extension parsers) to query the name of the file on which the IDE was launched. This is not a command line option.

  2. FStar.Interactive.Ide: This is all mostly transparent to FStar.Interactive.IDE when using the full-buffer mode, i.e., for VSCode. However, for backwards compat with fstar-mode.el, which only sends fragments of buffers, I rigged the repl-state to keep track of which language parser is currently active and use that to parse the next chunk sent by emacs.

@@ -133,6 +133,8 @@ let go _ =
| Success ->
fstar_files := Some filenames;

load_native_tactics ();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_native_tactics early, since now even the early phases of the parser, dependence scanning etc. can depend on a native plugin

@@ -686,6 +696,22 @@ match %sedlex lexbuf with
uninterpreted_blob snap name pos buffer lexbuf
| _ -> assert false

and use_lang_blob snap name pos buffer lexbuf =
match %sedlex lexbuf with
| "#end-lang" ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about ending an extension block with #lang-fstar (or #lang-pulse for that matter)?

I wonder if it is feasible for the extension parser/lexer to call back into the main parser. That is, if we could just add a #lang- rule to the pulse grammer?

One disadvantage of detecting the end of the extension block here is that it is unaware of the lexical syntax of the extension language (namely string literals and comments).

#lang-pulse

(*
We can exit a pulse block like this:
#end-lang
*)

fn foo () requires emp returns _: string ensures emp {
  "This is how you end a pulse extension block:
#end-lang
Everything after that is F* again."
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or an optional #end-lang-X to match a #lang-X?

The extension parser for Pulse does call back into the F* parser and lexer. So, it can, in principle, handle a nested #lang-x ... #end-lang-x block

The disadvantage you mentioned remains, but I guess it's not so bad ... you should be aware of the placement of the end-lang block.

That said, for Pulse itself, I don't see any reason to actually end the #lang-pulse block. Do you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm suggesting is that #lang-foo means "parse the rest of the file using Foo." And then the parser for Foo could have the same command, calling back to the main parser.

#lang-pulse

fn foo () requires emp ensures emp { () }

// not really necessary, as you point out
#lang-fstar

let bar = 42

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this works. You can even do

#lang-pulse
fn stuff ...
#lang-pulse
fn more stuff

And it will recursively call back into the Pulse parser for the second block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also registered an #lang-fstar parser and removed #end-lang

@nikswamy
Copy link
Collaborator Author

nikswamy commented Jul 20, 2024

I also got an Everest green with this; it is still compatible with ``` pulse .. . ``` style extensions.

@nikswamy nikswamy merged commit fa18fde into master Jul 21, 2024
3 checks passed
@nikswamy nikswamy deleted the nik_use_lang branch July 21, 2024 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants