diff --git a/compiler.rst b/compiler.rst index 3338cef4e..40384cb07 100644 --- a/compiler.rst +++ b/compiler.rst @@ -10,8 +10,8 @@ Abstract In CPython, the compilation from source code to bytecode involves several steps: -1. Parse source code into a parse tree (:file:`Parser/pgen.c`) -2. Transform parse tree into an Abstract Syntax Tree (:file:`Python/ast.c`) +1. Tokenize the source code (:file:`Parser/tokenizer.c`) +2. Parse the stream of tokens into an Abstract Syntax Tree (:file:`Parser/parser.c`) 3. Transform AST into a Control Flow Graph (:file:`Python/compile.c`) 4. Emit bytecode based on the Control Flow Graph (:file:`Python/compile.c`) @@ -23,49 +23,18 @@ in terms of the how the entire system works. You will most likely need to read some source to have an exact understanding of all details. -Parse Trees ------------ +Parsing +------- -Python's parser is an LL(1) parser mostly based off of the -implementation laid out in the Dragon Book [Aho86]_. +As of Python 3.9, Python's parser is a PEG parser of a somewhat +unusual design (since its input is a stream of tokens rather than a +stream of characters as is more common with PEG parsers). -The grammar file for Python can be found in :file:`Grammar/Grammar` with the -numeric value of grammar rules stored in :file:`Include/graminit.h`. The -list of types of tokens (literal tokens, such as ``:``, numbers, etc.) can -be found in :file:`Grammar/Tokens` with the numeric value stored in -:file:`Include/token.h`. The parse tree is made up -of ``node *`` structs (as defined in :file:`Include/node.h`). - -Querying data from the node structs can be done with the following -macros (which are all defined in :file:`Include/node.h`): - -``CHILD(node *, int)`` - Returns the nth child of the node using zero-offset indexing -``RCHILD(node *, int)`` - Returns the nth child of the node from the right side; use - negative numbers! -``NCH(node *)`` - Number of children the node has -``STR(node *)`` - String representation of the node; e.g., will return ``:`` for a - ``COLON`` token -``TYPE(node *)`` - The type of node as specified in :file:`Include/graminit.h` -``REQ(node *, TYPE)`` - Assert that the node is the type that is expected -``LINENO(node *)`` - Retrieve the line number of the source code that led to the - creation of the parse rule; defined in :file:`Python/ast.c` - -For example, consider the rule for 'while': - -.. productionlist:: - while_stmt: "while" `expression` ":" `suite` : ["else" ":" `suite`] - -The node representing this will have ``TYPE(node) == while_stmt`` and -the number of children can be 4 or 7 depending on whether there is an -'else' statement. ``REQ(CHILD(node, 2), COLON)`` can be used to access -what should be the first ``:`` and require it be an actual ``:`` token. +The grammar file for Python can be found in +:file:`Grammar/python.gram`. The definitions for literal tokens +(such as ``:``, numbers, etc.) can be found in :file:`Grammar/Tokens`. +Various C files, including :file:`Parser/parser.c` are generated from +these (see :doc:`grammar`). Abstract Syntax Trees (AST) @@ -569,10 +538,6 @@ thanks to having to support both classic and new-style classes. References ---------- -.. [Aho86] Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman. - `Compilers: Principles, Techniques, and Tools`, - https://www.amazon.com/exec/obidos/tg/detail/-/0201100886/104-0162389-6419108 - .. [Wang97] Daniel C. Wang, Andrew W. Appel, Jeff L. Korn, and Chris S. Serra. `The Zephyr Abstract Syntax Description Language.`_ In Proceedings of the Conference on Domain-Specific Languages, pp. diff --git a/grammar.rst b/grammar.rst index 912dbaef5..36b2985c3 100644 --- a/grammar.rst +++ b/grammar.rst @@ -7,22 +7,14 @@ Abstract -------- There's more to changing Python's grammar than editing -:file:`Grammar/Grammar`. This document aims to be a -checklist of places that must also be fixed. +:file:`Grammar/python.gram`. Here's a checklist. -It is probably incomplete. If you see omissions, submit a bug or patch. - -This document is not intended to be an instruction manual on Python -grammar hacking, for several reasons. - - -Rationale ---------- - -People are getting this wrong all the time; it took well over a -year before someone `noticed `_ -that adding the floor division -operator (``//``) broke the :mod:`parser` module. +NOTE: These instructions are for Python 3.9 and beyond. Earlier +versions use a different parser technology. You probably shouldn't +try to change the grammar of earlier Python versions, but if you +really want to, use GitHub to track down the earlier version of this +file in the devguide. (Python 3.9 itself actually supports both +parsers; the old parser can be invoked by passing ``-X oldparser``.) Checklist @@ -30,29 +22,29 @@ Checklist Note: sometimes things mysteriously don't work. Before giving up, try ``make clean``. -* :file:`Grammar/Grammar`: OK, you'd probably worked this one out. :-) After changing - it, run ``make regen-grammar``, to regenerate :file:`Include/graminit.h` and - :file:`Python/graminit.c`. (This runs Python's parser generator, ``Python/pgen``). +* :file:`Grammar/python.gram`: The grammar, with actions that build AST nodes. After changing + it, run ``make regen-pegen``, to regenerate :file:`Parser/parser.c`. + (This runs Python's parser generator, ``Tools/peg_generator``). * :file:`Grammar/Tokens` is a place for adding new token types. After changing it, run ``make regen-token`` to regenerate :file:`Include/token.h`, :file:`Parser/token.c`, :file:`Lib/token.py` and - :file:`Doc/library/token-list.inc`. If you change both ``Grammar`` and ``Tokens``, - run ``make regen-tokens`` before ``make regen-grammar``. + :file:`Doc/library/token-list.inc`. If you change both ``python.gram`` and ``Tokens``, + run ``make regen-token`` before ``make regen-pegen``. -* :file:`Parser/Python.asdl` may need changes to match the Grammar. Then run ``make +* :file:`Parser/Python.asdl` may need changes to match the grammar. Then run ``make regen-ast`` to regenerate :file:`Include/Python-ast.h` and :file:`Python/Python-ast.c`. * :file:`Parser/tokenizer.c` contains the tokenization code. This is where you would add a new type of comment or string literal, for example. -* :file:`Python/ast.c` will need changes to create the AST objects involved with the - Grammar change. +* :file:`Python/ast.c` will need changes to validate AST objects involved with the + grammar change. -* The :doc:`compiler` has its own page. +* :file:`Python/ast_unparse.c` will need changes to unparse AST objects involved with the + grammar change ("unparsing" is used to turn annotations into strings per :pep:`563`). -* The :mod:`parser` module. Add some of your new syntax to ``test_parser``, - bang on :file:`Modules/parsermodule.c` until it passes. +* The :doc:`compiler` has its own page. * Add some usage of your new syntax to ``test_grammar.py``.