Skip to content

Align the grammar documentation with Python's actual grammar #127833

Open
@encukou

Description

@encukou

Documentation

The current documentation of Python syntax (the later chapters of the language reference) uses hand-maintained production lists, like this:

A)

compound_stmt ::=  if_stmt
                   | while_stmt
                   | for_stmt
                   | try_stmt
                   | with_stmt
                   | match_stmt
                   | funcdef
                   | classdef
                   | async_with_stmt
                   | async_for_stmt
                   | async_funcdef
suite         ::=  stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement     ::=  stmt_list NEWLINE | compound_stmt
stmt_list     ::=  simple_stmt (";" simple_stmt)* [";"]

There is no mechanism to ensure that these are in sync with the actual grammar, and they inevitably do get out of sync.
See some of the “docs” issues mentioning “grammar”.

It's not easy to write an automatic tool to keep them in sync, because we do want to elide some details -- the parser rules, unnecessary lookaheads, cuts, etc. But, it's possible to write it, and we wrote a proof of concept, which will need to be rewritten, tuned, and reviewed. Before introducing it, I'd like to go through all the docs, correct the existing documentation, bring it closer to what a tool could generate, and discuss what the ideal presentation would look like. That needs to be a manual process, and it will also need to touch the prose that's next to the grammar snippets.

As a first step, I propose an update to the tooling, which brings the presentation a bit closer to the python.gram syntax.

From the existing ReST source, we can get this:

B)

compound_stmt: if_stmt
               | while_stmt
               | for_stmt
               | try_stmt
               | with_stmt
               | match_stmt
               | funcdef
               | classdef
               | async_with_stmt
               | async_for_stmt
               | async_funcdef
suite:         stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement:     stmt_list NEWLINE | compound_stmt
stmt_list:     simple_stmt (";" simple_stmt)* [";"]

Since Sphinx hard-codes the productionlist formatting (the ::= symbol and the aligning), we'll need to override the productionlist directive to achieve this.

Then, by changing the ReST and using a different directive, we can get to something like:

C)

compound_stmt:
    | if_stmt
    | while_stmt
    | for_stmt
    | try_stmt
    | with_stmt
    | match_stmt
    | funcdef
    | classdef
    | async_with_stmt
    | async_for_stmt
    | async_funcdef
suite:
    | stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement:
    | stmt_list NEWLINE | compound_stmt
stmt_list:
    | simple_stmt (";" simple_stmt)* [";"]

I propose to go from A) to B) at once (by overriding productionlist), and from B) to C) gradually, while also updating the content (including changing rule names to match the grammar, and adjusting/reorganizing nearby prose).
I think that the B) and C) styles are similar enough that mixing them in a single version of the docs should not be jarring.

By the way, one additional benefit of a custom directive is that we can add syntax highlighting. (Ideally, with support from the theme.) I think that making strings stand out makes the listings more readable:

image


As a second step, I'd like to rewrite token documentation, and then the lexical analysischapter, on which the grammar chapters build: #135676

Then, continue with the grammar chapters in Language reference.

Linked PRs

Other related PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc dir

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions