Fix "box model" for code blocks #1317

panglesd · 2025-02-20T09:01:13Z

TODO:

Improve tests.
Add docs/check how clear it is

In my mental model, code blocks are supposed to follow a "box model", where the actual content of a code block is the smallest box that contains it all:

In the example above, the actual content of the code blocks is in lighter blue. As you can see, some whitespace has been stripped before, after, but also at the beginning of each line.

However, (supposing my mental model is the right one), there are two bugs in the implementation.

The first one is about lines with only whitespace, but less than the box:

Suppose that _ are spaces (I used _ to make it visible). I think those spaces are not part of the box and should be stripped, but currently they are part of the content, which would be:

    Here is some content
___
With some other content

  Hey!

The second one is about a corner case of the "box model": when the code block does not start with a new line.

In this case, it is clear what the content of the code block should be:

However, in this case it is not!

Of course, {[ should not be part of "the box". So, what do we put instead?

One possibility would be to fill it with spaces:

     Here is some content

With some other content

Another solution would be to keep the first line as it is, in this degenerated case:

 Here is some content

With some other content

The currently implemented solution is very confusing in my opinion. For the first line, it removes a number of spaces corresponding to the min between the de-indentation level and the number of opening spaces in the first line. In effect it ignores the offset in which the first line start.

The first solution seems the most consistent with the box model, but also probably the most "breaking change". The second solution seems good as:

{[a
b
c]}

is turned as:

a
b
c

and not

  a
b
c

as it would in the first solution.

Which solution do you think is best?

panglesd · 2025-02-20T09:09:52Z

I think this PR should serve as a basis for @NoahTheDuke to add the layout information necessary to reconstruct the actual layout from the content. Indeed, without a proper "box model", it will be hard to have a nice layout type.

I think this layout could have the following type:

type code_block_layout = {
  top : string;
  left : int list;
  bottom : string
}

And that it should be possible to reconstruct the content with the following function:

let reconstruct layout content =
  let lines = String.split_on_char '\n' content in
  let lines = List.map2 (fun i l -> (String.make ' ' i ^ l) layout.left lines in
  layout.top ^
  String.concat ~sep:'\n' lines
  ^ layout.bottom

dbuenzli · 2025-02-20T09:18:25Z

I think you rather want to mostly follow what commonmark does here slightly adapted for the fact that the content can be on the line delimiting the block {[. That would be:

Strip leading and trailing whitespace from the contents of {[…]}.
The first non whitespace character in {[…]} defines the left margin column m
For each line, indent to column m, if there is a non-whitespace character before column m
For each line strip whitespace < m.

panglesd · 2025-02-20T09:32:52Z

That would be a possibility too, though more breaking (although I'm not sure how many code blocks would change in practice).

That being said, I'm not sure how close this is to what commonmark does. From what I understand, commonmark only looks at the indentation of the opening fence (as mentioned in this example). In what you suggest, it's not the fence opening which determines the left margin, but first non-whitespace character, which feels quite different to me.

We could also use the fence to determine left-margin. But, in

{[
    bli
]}

the content would become indented.

I would be in favor of keeping the current box model. It's fewer changes, and only few people would relate the "common-mark like" proposal to their experience of commonmark. But I'm actually fine with both.

panglesd · 2025-02-20T09:37:06Z

Also, the box model allows to start with spaces (but not newlines), and so do commonmark. Using the first non-whitespace as left-margin does not allow that! (Although, I cannot give an example of a plausible scenario where we need to do that, I'm sure there is one!)

dbuenzli · 2025-02-20T09:43:34Z

That being said, I'm not sure how close this is to what commonmark does. From what I understand, commonmark only looks at the indentation of the opening fence (as mentioned in this example). In what you suggest, it's not the fence opening which determines the left margin,

Yes it is different: there is never any content on the opening fence in commonmark so you can use the marker for defining the indentation level.

Using the first non-whitespace as left-margin does not allow that

You could escapes for that but I'm no longer sure how escapes work in ocamldoc it has always been a bit ill-defined an confusing.

That being said if your goal is this then you would be better off keeping a precise location of the start of content and simply keep the raw string of the content in the node.

dbuenzli · 2025-02-20T09:48:50Z

keep the raw string of the content in the node.

(And not even, if you have precise start/end locations and the source you can simply extract the part)

dbuenzli · 2025-02-20T09:59:46Z

I would be in favor of keeping the current box model.

Note that the problem with the current box model, is that it's not very clear. To this day I always write code fences fully flushed left to make sure I get what I want… This allows for a clear mental model in my opinion: namely something clear has to define the indentation level in the beginning of the code fence and then everything is simply left aligned on that indentation level with any whitespace before reaching the indentation level being dropped.

NoahTheDuke · 2025-02-20T12:06:42Z

Ocaml's comments are naturally indented, which makes using Commonmark's style harder. It requires deleting indentation that an editor or formatter might try to add.

jonludlam · 2025-02-20T13:42:15Z

(And not even, if you have precise start/end locations and the source you can simply extract the part)

Mdx does it this way - by getting the locations (which are indeed precise) and splicing the text from the original file.

NoahTheDuke · 2025-02-20T14:49:06Z

To be clear, the only benefit to handling this is for display, right? Ocaml doesn't care about indentation, so Mdx and/or an odoc extract command would just extract the raw text and slap it into a file. Having the precise location of the text within the {[...]} tags is only meaningful for displaying in docs files or hovers.

NoahTheDuke · 2025-02-20T16:26:31Z

I think the box model is the correct way to think about code blocks (and verbatim blocks, while we're at it).

ocaml/ocaml doesn't have a consistent way of indenting code blocks right now.

1) stdlib/float.mli puts the first line against the braces and then indents the following lines to the column where the braces sit (note the 2 space indent to everything, including the opening of the comment block):

  (** ...
      For example, consider the following program:
  {[let size = 100_000_000
  let a = Float.Array.make size 1.
  let update a f () =
     Float.Array.iteri (fun i x -> Float.Array.set a i (f x)) a
  let d1 = Domain.spawn (update a (fun x -> x +. 1.))
  let d2 = Domain.spawn (update a (fun x ->  2. *. x +. 1.))
  let () = Domain.join d1; Domain.join d2
  ]}
  ... *)

2) stdlib/condition.mli has a line break and then indents to the column one past the square brace:

(** ...
   {[
     Mutex.lock m;
     while not P do
       Mutex.unlock m; Mutex.lock m
     done;
     <update the data structure>;
     Mutex.unlock m
   ]}
... *)

3) stdlib/bytes.mli has a line break and no indentation:

(** ...
   {[
let string_init len f : string =
  let s = Bytes.create len in
  for i = 0 to len - 1 do Bytes.set s i (f i) done;
  Bytes.unsafe_to_string s
   ]}
... *)

4) Another part of stdlib/bytes.mli dedents the braces, has no spaces or line break, and no indentation of the code block:

(** ...
    For example, consider the following program:
{[let size = 100_000_000
let b = Bytes.make size  ' '
let update b f ()  =
  Bytes.iteri (fun i x -> Bytes.set b i (Char.chr (f (Char.code x)))) b
let d1 = Domain.spawn (update b (fun x -> x + 1))
let d2 = Domain.spawn (update b (fun x -> 2 * x + 1))
let () = Domain.join d1; Domain.join d2
]}
... *)

5) stdlib/arg.mli dedents the braces, and indents the code block two spaces past where the text above sits (an awkward 5 spaces):

(** ...
   This module provides a general mechanism for extracting options and
   arguments from the command line to the program. For example:

{[
     let usage_msg = "append [-verbose] <file1> [<file2>] ... -o <output>"
     let verbose = ref false
     let input_files = ref []
     let output_file = ref ""
... *)

6) stdlib/nativeint.mli has a line break and then indents the code block to align with the square brace instead of 1 past:

(** ...
    {[
     let zero: nativeint = 0n
     let one: nativeint = 1n
     let m_one: nativeint = -1n
    ]}
... *)

7) stdlib/queue.mli has a line break and then indents to align with the curly brace:

(** ...
    {[
    # let q = Queue.create ()
    val q : '_weak1 Queue.t = <abstr>
... *)

8) stdlib/dynarray.ml (not .mli) has no line break and indents to 1 past the square braces:

  (** ...
      We could instead provide
      [val find : ('a, 'stamp) with_dummy -> ('a, 'stamp dummy) result]
      but this would involve intermediary allocations.

      {[match find x with
        | None -> ...
        | Some v -> ...]}
  ... *)

9) Most cursed of all, stdlib/either.mli dedents the braces, has no linebreak and then indents past the code of the initial line as if it was not on the same line as the braces:

(** ...
    For example:

{[List.partition_map:
    ('a -> ('b, 'c) Either.t) -> 'a list -> 'b list * 'c list]}
... *)

(This is the only instance of this style in ocaml/ocaml, which means we can safely ignore it (or even change it directly to align with other styles).)

All of the code blocks except for 9) are consistent with the below algorithm.

If there is any text on the line containing the opening braces, ignore it for now.
Find the least indented line in the remaining lines, store the indent as the left edge of the box.
Remove the left edge indentation of each remaining line.
If there is any text on the line containing the opening braces, trim the whitespace and insert it at the front.

jonludlam · 2025-02-20T17:25:04Z

Just thinking about the implementation, I was wondering about maybe having the raw unaltered text in the AST, then a separate function that applies the formatting rules being suggested in this thread. If we did it that way we'd avoid the need for the original source as currently required by mdx.

NoahTheDuke · 2025-02-20T20:06:01Z

I took a shot at implementing my algorithm above which led to #1318. Can be helpful to compare the two sets of outputs.

dbuenzli · 2025-02-20T21:32:51Z

Just thinking about the implementation, I was wondering about maybe having the raw unaltered text in the AST, then a separate function that applies the formatting rules being suggested in this thread.

That looks ok to me.

Ocaml's comments are naturally indented, which makes using Commonmark's style harder. It requires deleting indentation that an editor or formatter might try to add.

Nothing different here, CommonMark also does that. The only difference is that CommonMark's delimiters stand on their own line (and is thus easier to use as the indentation mark).

To be clear, the only benefit to handling this is for display, right?

No, you need precise locations and layout for compilation errors of the extracted source code to be reported at the right place in the original. You need a line directive followed by the exact, verbatim, code for compilation errors to show up exactly at the right place in the source.

All of the code blocks except for 9) are consistent with the below algorithm.

Unless I'm mistaken that is the case for my proposal too. Personally I don't see any benefit of using the line with the smallest indent as you suggest (rather than the first line):

It makes it harder to spot the indentation point on larger code samples.
It's more brittle to careless c&p and/or additions.

Besides the same problem as @panglesd mentioned remains: how do you solve the problem of enforcing an initial indent ? For the latter I have a few ideas but I don't really like them. One would be to simply use the non newline whitespace directly following {[ to define the indent if the line is blank (made of only whitespace). It is obscure but then we can say that it's likely only used for 0.01% of the cases.

NoahTheDuke · 2025-02-21T13:49:03Z

The reason you use the lowest indent is that in example 9), if it had multiple lines, then you'd account for additional indentation on the second line:

(** ...
    For example:

{[List.partition_map:
    ('a -> ('b, 'c) Either.t) -> 'a list -> 'b list * 'c list

  List.partition_map:
    ('a -> ('b, 'c) Either.t) -> 'a list -> 'b list * 'c list]}
... *)

renders as

       [%expect {|
         ((output
           (((f.ml (2 4) (2 16))
             (paragraph
              (((f.ml (2 4) (2 7)) (word For)) ((f.ml (2 7) (2 8)) space)
               ((f.ml (2 8) (2 16)) (word example:)))))
            ((f.ml (4 0) (8 63))
             (code_block
              ((f.ml (4 2) (8 61))
                "List.partition_map:\
               \n  ('a -> ('b, 'c) Either.t) -> 'a list -> 'b list * 'c list\
               \n\
               \nList.partition_map:\
               \n  ('a -> ('b, 'c) Either.t) -> 'a list -> 'b list * 'c list")))))
          (warnings ())) |}]

In what scenarios do you want to enforce an initial indent in a code block?

dbuenzli · 2025-02-21T20:28:57Z

The reason you use the lowest indent

From a usability point of view you turn the problem of understanding how stuff is going to be indented into checking n (i.e. every) lines rather than 1. This becomes particularly acute when your code blocks spans more than one screenful. So I really rather not have that solution.

In what scenarios do you want to enforce an initial indent in a code block?

Who knows ? There's always the odd case out there and it's enraging the day you need it. But a simple example is ASCII art.

{[
     +---+---+---
tape | A | A | …
     +---+---+--- 
]}

NoahTheDuke · 2025-02-21T20:52:17Z

Your example would be correctly handled by finding the lowest indent (tape), and incorrectly handled by using the first line's indent ( +).

Using lowest indent:

     let stdlib_either =
       test "
{[
     +---+---+---
tape | A | A | …
     +---+---+--- 
]}
       ";
       [%expect {|
         ((output
           (((f.ml (2 0) (6 2))
             (code_block
              ((f.ml (2 2) (6 0))
                "     +---+---+---\
               \ntape | A | A | \226\128\166\
               \n     +---+---+--- ")))))
          (warnings ())) |}]

Using first line indent:

     let stdlib_either =
       test "
{[
     +---+---+---
tape | A | A | …
     +---+---+--- 
]}
       ";
       [%expect {|
         ((output
           (((f.ml (2 0) (6 2))
             (code_block
              ((f.ml (2 2) (6 0))
                "+---+---+---\
               \n| A | A | \226\128\166\
               \n+---+---+--- ")))))
          (warnings ())) |}]

dbuenzli · 2025-02-21T21:10:40Z

Right, now suppose you want to write:

Here is the box:

{[
+---+
| A |
+---+
]}

After shifting one position to the right you get this configuration:

{[
    +---+
    | A |
    +---+
]}

dbuenzli · 2025-02-21T21:27:30Z

We could also use the fence to determine left-margin. But, in
{[
    bli
]}
the content would become indented.

I think in the end that this is actually the more natural and less confusing overall (and allows to specify any initial indent you wish without fuss). There is the question of what happens when you have:

  {[bli
  …
  ]}

but you can simply say that this is equivalent to:

  {[
  bli
  …
  ]}

So basically the first column of {[ defines the left margin. Anything that is on the left of that margin is automatically indented to the margin. The rule is simple to understand and allows for any hand-made ASCII layout you wish.

NoahTheDuke · 2025-02-21T23:33:27Z

That's pretty clever. Thanks for thinking it through with us.

panglesd · 2025-02-24T11:08:07Z

Thanks for all the ideas and discussion. Although this PR is "just" a fix for an already established syntax, and not an attempt to change it, it's good to have feedback on it, and we can maybe change it. I'll try to sum up the choice we have, but first:

Just thinking about the implementation, I was wondering about maybe having the raw unaltered text in the AST, then a separate function that applies the formatting rules being suggested in this thread. If we did it that way we'd avoid the need for the original source as currently required by mdx.

Mdx currently changes the code blocks. But you want it to change them while preserving the layout. For instance:

{[
              # let x = 1 ;;
]}
{[
# let x = 1 ;;
]}

should be rewritten to

{[
              # let x = 1 ;;
              val x : int
]}
{[
# let x = 1 ;;
val x : int
]}

so that eg it does not fight with ocamlformat which is responsible for the layout.

That is why we might need a bit more than the raw string, but a standardized way of going from/to raw content/deindented content.

I'll try to sum up the pros and cons of each possibility, at this point:

Here are the nice properties we would like to have:

It is easy for the user to "intuitively" know what will be the content of the code block from the raw string
It does not break too many existing documentation
It allows to have an indentation in the first line
Be fit for long code blocks:

Be easy to spot the indentation point on larger code samples.
Not be brittle to careless c&p and/or additions.

We have several candidates for that:
A. Using the smallest indent.

I think that (with this PR fixing the bugs) it's as intuitive as the others
It's just fixing bugs so should not break any documentation
Does allow to have an indentation as the first line
Not so fit for long code blocks as the indentation depend on it all

B. Using the first indent

I think it's as intuitive as the others. Eg I think @NoahTheDuke got it wrong here: tape should be part of the output which would break the indentation
Breaking impact has to be measured but not that much code starts with an indentation
Does not allow to have an indentation as the first line which is a problem for ascii art-based languages ^^
Fit for long code blocks

C. Using the opener's indent

I think it's as intuitive as the others.
Breaking impact has to be measured. At least ocamlformat's default formatting makes it non-breaking.
Allow to have an indentation as the first line
Fit for long code blocks

So I'm fine with A. or C. as long as we find out it's not going to break to many code blocks!

I'll try to investigate C's breaking.

dbuenzli · 2025-02-24T11:34:06Z

Btw. I'm just curious are verbatim blocks treated differently from code blocks ? Because with the odoc3. I'm currently forced to write e.g.:

    {- [index_properties], the index properties. 0 if not indexed.
{v
7                               0
+---+---+---+---+---+---+---+---+
|   |   |   |   | L | S | T | F |
+---+---+---+---+---+---+---+---+
v}
    }

To make sure the verbatim block is not indented. If I write it:

    {- [index_properties], the index properties. 0 if not indexed.
       {v
       7                               0
       +---+---+---+---+---+---+---+---+
       |   |   |   |   | L | S | T | F |
       +---+---+---+---+---+---+---+---+
       v}
    }

the block is indented:

dbuenzli · 2025-02-24T11:40:26Z

So I'm fine with A. or C. as long as we find out it's not going to break to many code blocks!

Note that only C allows you to write this example. The problem is not about being able to indent on the first line, it's to be able to indent the whole content as you wish in the code block.

jonludlam · 2025-02-24T11:52:53Z

That is why we might need a bit more than the raw string, but a standardized way of going from/to raw content/deindented content.

Yes, that's what I'm asking for.

As an example where we might want the raw text rather than the processed stuff, we could do a better job of error messages than mdx does today by inserting line directives, so that the line numbers matched the source file correctly. However, I don't think there's such a thing as 'column directives', so to get the column numbers to match the source file correctly you could insert each line in exactly from the source, including the indentation.

panglesd · 2025-02-24T12:21:24Z

Some more though about C:
If we use the first character ({) of the opening, we get that ocamlformat automatically indent code blocks, that is lots of code blocks are affected by this change:

(** I'm formatted by ocamlformat

    {[
      hello ocamlformat
    ]} *)

renders as hello ocamlformat

If we use the last character ([), then many things will get de-indented:

(** I'm formatted by ocamlformat

    {@ocaml[
      let x =
        match w with
        | AAAAAAAAAAAAAAAAAAAAAAAA -> 1
        | BBBBBBBBBBBBBBBBBBBBBBBBB -> 2
    ]} *)

will render as

let x =
match w with
| AAAAAAAAAAAAAAAAAAAAAAAA -> 1
| BBBBBBBBBBBBBBBBBBBBBBBBB -> 2

We could use as indent the position of { + 2 but well...

dbuenzli · 2025-02-24T12:58:52Z

so to get the column numbers to match the source file correctly you could insert each line in exactly from the source, including the indentation.

Yes that's what you need if you want to use line directives.

we get that ocamlformat automatically indent code blocks

Well this could be changed in ocamlformat.

In any case I don't think it's a good idea to indent after {[, you are usually already quite nested and you still want to preserve horizontal space to write your code blocks.

panglesd · 2025-02-24T13:03:43Z

Well this could be changed in ocamlformat.

Yes, what I'm saying is more: there are currently many projects using ocamlformat, with automatically indented code blocks. Changing the indentation to be based on the opening { will "break" all of them. (with "breaking" meaning adding two spaces in front)

About horizontal space: I'm slightly concerned with

  Some
    reason
      to
        indent 
          {[
let x =
  1
          ]}

being rendered as

let x =
1

panglesd · 2025-02-24T13:09:20Z

Out of the example given by @NoahTheDuke at #1317 (comment), I think that examples 2, 3, 5 and 6 will get their rendering wrong:

2 5 and 6 will be indented too much
3 will be deindented

dbuenzli · 2025-02-24T13:36:35Z

Note the whole code block semantics has already been through a few changes and it's perhaps a good idea to review these discussions to what got us to what we have now.

#39
#134
#133
#132

Personally I don't think the right decision was taken at the time (it seems they did not realize that you could not enforce arbitrary layouts) but perhaps it's not worth changing again. However I have to say, as far as I'm concerned, the only reliable way of working with code block and verbatims in ocamldoc has remained the "ugly workaround" mentioned here, that is always write all that fully flushed left; so the status quo may not be optimal either.

panglesd · 2025-03-03T21:11:18Z

This PR is ready for review:

Implements what has been decided here, with the slight changes that single line code blocks ({[ blibli ]}) are trimmed and accepted, as discussed with @jonludlam.
Raises warnings both on "too few indentation" and "multiline code blocks not starting with a new line".
Verbatims are treated the same way as code blocks. It was previously not specified how they are treated in the documentation, and it seems good to have them as code blocks. From what I can understand, everyone agrees with that in the discussion, but there was no discussion focused on that, so let me know if I'm wrong.
Tests are updated.
Docs are updated.

dbuenzli · 2025-03-03T21:27:02Z

that single line code blocks ({[ blibli ]}) are trimmed and accepted,

Is it really worth making an exception here ? It makes the construct more complex to understand and perhaps that some explanations can be written more compactly that way.

For example:

In fact the we can simply replace the line 
  {[  let x = List.iter f v  ]}
of the above function by 
  {[  let y = Array.iter f v ]}
and get the result we wish.

and "multiline code blocks not starting with a new line".

Are you sure about that one ? In the past odoc started to warn on a few things that turned out to be rather annoying, I would rather be conservative and not do that.

panglesd · 2025-03-04T07:46:05Z

It makes the construct more complex to understand and perhaps that some explanations can be written more compactly that way.

I'm a bit confused! The first part of the sentence seems to say it's not worth it, and the second part seem to say that it is worth it 🫤
Treating single line code blocks as "exceptional" have three effects:

It allows to trim at the end of the string (in {[ hello ]} we probably don't want the ending space. Currently code blocks trim trailing lines but not trailing space on the last line, I kept that behavior).
We don't raise a warning.
We explain it a bit differently. I could not find a way to explain it that was easy to understand both for one-liners and multilines, without explaining those two cases separately. I think it is fine if "multiline content not starting by a new line" is harder to understand.

So, if we trim trailing spaces in code blocks, which maybe we should do, there is no distinction in how multiline/single lines content is handled. But I would still prefer to explain how they are handled "case by case".

In the past odoc started to warn on a few things that turned out to be rather annoying, I would rather be conservative and not do that.

I ~~think~~ ~~hope~~ maybe that was because odoc was warning about dependencies problems, and that with the new flags which allow to filter warnings the situation will be better, which will allow to fix the warnings as they come.

dbuenzli · 2025-03-04T08:01:45Z

I'm a bit confused! The first part of the sentence seems to say it's not worth it, and the second part seem to say that it is worth it 🫤

No I think we misunderstood each other

You said:

Implements what has been decided here, with the slight changes that single line code blocks ({[ blibli ]}) are trimmed and accepted, as discussed with @jonludlam.

To me this means String.trim so also trimming the initial space which is what I would object (I care a bit less about the trailing thing if you want to make an exception here). If you keep the initial white that's fine with me.

But then I just find that simply saying that:

{|…

is strictly equivalent to

{|
…

and then explain how:

{|
…
|}

works is the easiest. You no longer need a notion of single or multi line code block. There are just code blocks.

I ~~think~~ ~~hope~~ maybe that was because odoc was warning about dependencies problems, and that with the new flags which allow to filter warnings the situation will be better, which will allow to fix the warnings as they come.

Not at all. It was warnings about the way I wrote things in odoc that suddenly odoc had opinions about. It was quite annoying.

panglesd · 2025-03-04T09:09:47Z

No I think we misunderstood each other

Indeed!

What I had understood from this comment, was that:

  {[   text after space
  ...
  ]}

was considered as:

  {[
  text after space
  ...
  ]}

but in fact, I now think you meant that it is turned into:

  {[
     text after space
  ...
  ]}

So in what I understood version, the beginning was trimmed anyway, which explains why I was confused why you were dissatisfied with trimming the beginning of single line content.

Thinking about it, not trimming makes sense. I'll add a commit to this PR with this behavior.

panglesd · 2025-03-04T09:16:09Z

About the warning, you wrote:

However note that examples #1317 (comment) are actually easy to detect (non-blank content before the column of the marker {[) and for users to fix (simply move the {[ marker at the right place). So a good trade-off could be to tackle these cases with a new warning.

So in some case, it appears OK to raise a warning if odoc has a wrong opinion on the layout of code blocks! That being said, I'm ok to remove this warning.

dbuenzli · 2025-03-04T09:36:31Z

So in some case, it appears OK to raise a warning if odoc has a wrong opinion on the layout of code blocks!

No that's a different warning. It's a warning to help people getting rid of a semantics of code blocks we'd like to get rid of.

I'm more talking about warnings/errors like this one which, while being well-meaning, end up being unhelpful for all sorts of reasons (there were other occurences but I fail to find them).

Note that if by:

Thinking about it, not trimming makes sense.

you meant no initial and trailing trimming that issue may actually be solved by this PR.

panglesd · 2025-03-04T11:14:48Z

In the last changes, I removed the trimming for more than a whitespace only line (as per this comment, which praises a behavior that was lost...)
I updated the docs.
I removed the warning on "multiline code blocks that do not start with a newline" ¹

So now, the semantics of code block content is reasonably easy to explain, and it is possible to have code blocks with arbitrary whitespace in it.

I hope the PR is in a final state in term of design!

although, I would classify that as a kind of comments we want to get rid of: either we lose the alignment between the first and the second line, as the first line is shifted by 2 ({[) or more ({delim@ocaml[) to the left; or we lose the ability to start the first line at the beginning of the line. ↩

jonludlam · 2025-03-04T12:28:37Z

I'd still like to have the raw untrimmed code block in the AST to avoid the need to keep a copy of the original source around for MDX-like uses. Or perhaps we should have that in another PR?

doc/odoc_for_authors.mld

panglesd · 2025-03-04T14:17:25Z

I'd still like to have the raw untrimmed code block in the AST to avoid the need to keep a copy of the original source around for MDX-like uses. Or perhaps we should have that in another PR?

I started doing that in another branch!

Co-authored-by: Daniel Bünzli <[email protected]>

jonludlam

Looks good, just some comments to change!

jonludlam · 2025-03-05T10:44:05Z

src/parser/lexer.mll

+    else
+      match s.[index] with
+      | ' ' | '\t' | '\r' -> handle_first_newline (index + 1)
+      (* Multiline starting with an empty line *)


These comments are a little confusing, and could do with rewording.

jonludlam · 2025-03-05T11:02:58Z

Let's change the comments in your other PR based on this one, to avoid the rebase.

NoahTheDuke mentioned this pull request Feb 20, 2025

WIP: Attempt to fix the "box model" of code blocks #1318

Closed

panglesd added 3 commits March 3, 2025 21:59

Add docs on indentation for code blocks

2171c57

Uniformize verbatims and code blocks

dc1586f

Add changelog entry for ocaml#1137

c74c2cc

panglesd force-pushed the fix-box-model branch from 564b37e to c74c2cc Compare March 3, 2025 21:00

Fix cheatsheet uses of code blocks

1a11e8e

panglesd added 3 commits March 4, 2025 11:44

Code block: Do not trim more than one leading and one trailing line

161b695

Update docs on code blocks

29cb48d

Fix verbatim and Update tests

bf9a778

dbuenzli reviewed Mar 4, 2025

View reviewed changes

doc/odoc_for_authors.mld Outdated Show resolved Hide resolved

dbuenzli reviewed Mar 4, 2025

View reviewed changes

doc/odoc_for_authors.mld Outdated Show resolved Hide resolved

dbuenzli reviewed Mar 4, 2025

View reviewed changes

doc/odoc_for_authors.mld Outdated Show resolved Hide resolved

panglesd added 2 commits March 4, 2025 15:12

Fix off-by-one index

38effce

Precise what to do in case of whitespaced single line code block

00cc1fb

panglesd and others added 2 commits March 4, 2025 16:14

Improve docs on code blocks

ce149e4

Co-authored-by: Daniel Bünzli <[email protected]>

Improve namings for code block content parsing

9e78263

panglesd mentioned this pull request Mar 4, 2025

Store raw content of code blocks in parser's AST #1325

Merged

jonludlam approved these changes Mar 5, 2025

View reviewed changes

jonludlam merged commit 41d293f into ocaml:master Mar 5, 2025
11 checks passed

dbuenzli mentioned this pull request Mar 8, 2025

Readable multi-line quoted string literals ocaml/ocaml#13860

Open

Fix "box model" for code blocks #1317

Fix "box model" for code blocks #1317

Conversation

panglesd commented Feb 20, 2025 • edited Loading

panglesd commented Feb 20, 2025

dbuenzli commented Feb 20, 2025 • edited Loading

panglesd commented Feb 20, 2025

panglesd commented Feb 20, 2025

dbuenzli commented Feb 20, 2025 • edited Loading

dbuenzli commented Feb 20, 2025

dbuenzli commented Feb 20, 2025 • edited Loading

NoahTheDuke commented Feb 20, 2025

jonludlam commented Feb 20, 2025

NoahTheDuke commented Feb 20, 2025

NoahTheDuke commented Feb 20, 2025 • edited Loading

jonludlam commented Feb 20, 2025

NoahTheDuke commented Feb 20, 2025

dbuenzli commented Feb 20, 2025 • edited Loading

NoahTheDuke commented Feb 21, 2025

dbuenzli commented Feb 21, 2025

NoahTheDuke commented Feb 21, 2025

dbuenzli commented Feb 21, 2025

dbuenzli commented Feb 21, 2025

NoahTheDuke commented Feb 21, 2025

panglesd commented Feb 24, 2025

dbuenzli commented Feb 24, 2025

dbuenzli commented Feb 24, 2025

jonludlam commented Feb 24, 2025

panglesd commented Feb 24, 2025 • edited Loading

dbuenzli commented Feb 24, 2025

panglesd commented Feb 24, 2025 • edited Loading

panglesd commented Feb 24, 2025

dbuenzli commented Feb 24, 2025

panglesd commented Mar 3, 2025

dbuenzli commented Mar 3, 2025

panglesd commented Mar 4, 2025

dbuenzli commented Mar 4, 2025

panglesd commented Mar 4, 2025

panglesd commented Mar 4, 2025

dbuenzli commented Mar 4, 2025 • edited Loading

panglesd commented Mar 4, 2025

Footnotes

jonludlam commented Mar 4, 2025

panglesd commented Mar 4, 2025

jonludlam left a comment

Choose a reason for hiding this comment

jonludlam Mar 5, 2025

Choose a reason for hiding this comment

jonludlam commented Mar 5, 2025

panglesd commented Feb 20, 2025 •

edited

Loading

dbuenzli commented Feb 20, 2025 •

edited

Loading

dbuenzli commented Feb 20, 2025 •

edited

Loading

dbuenzli commented Feb 20, 2025 •

edited

Loading

NoahTheDuke commented Feb 20, 2025 •

edited

Loading

dbuenzli commented Feb 20, 2025 •

edited

Loading

panglesd commented Feb 24, 2025 •

edited

Loading

panglesd commented Feb 24, 2025 •

edited

Loading

dbuenzli commented Mar 4, 2025 •

edited

Loading