Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "box model" for code blocks #1317

Merged
merged 21 commits into from
Mar 5, 2025
Merged

Conversation

panglesd
Copy link
Collaborator

@panglesd panglesd commented Feb 20, 2025

TODO:

  • Improve tests.
  • Add docs/check how clear it is

In my mental model, code blocks are supposed to follow a "box model", where the actual content of a code block is the smallest box that contains it all:

image

In the example above, the actual content of the code blocks is in lighter blue. As you can see, some whitespace has been stripped before, after, but also at the beginning of each line.

However, (supposing my mental model is the right one), there are two bugs in the implementation.


The first one is about lines with only whitespace, but less than the box:

image

Suppose that _ are spaces (I used _ to make it visible). I think those spaces are not part of the box and should be stripped, but currently they are part of the content, which would be:

    Here is some content
___
With some other content

  Hey!

The second one is about a corner case of the "box model": when the code block does not start with a new line.

In this case, it is clear what the content of the code block should be:

image

However, in this case it is not!

image

Of course, {[ should not be part of "the box". So, what do we put instead?

One possibility would be to fill it with spaces:

     Here is some content

With some other content

Another solution would be to keep the first line as it is, in this degenerated case:

 Here is some content

With some other content

The currently implemented solution is very confusing in my opinion. For the first line, it removes a number of spaces corresponding to the min between the de-indentation level and the number of opening spaces in the first line. In effect it ignores the offset in which the first line start.

The first solution seems the most consistent with the box model, but also probably the most "breaking change". The second solution seems good as:

{[a
b
c]}

is turned as:

a
b
c

and not

  a
b
c

as it would in the first solution.

Which solution do you think is best?

@panglesd
Copy link
Collaborator Author

I think this PR should serve as a basis for @NoahTheDuke to add the layout information necessary to reconstruct the actual layout from the content. Indeed, without a proper "box model", it will be hard to have a nice layout type.

I think this layout could have the following type:

type code_block_layout = {
  top : string;
  left : int list;
  bottom : string
}

And that it should be possible to reconstruct the content with the following function:

let reconstruct layout content =
  let lines = String.split_on_char '\n' content in
  let lines = List.map2 (fun i l -> (String.make ' ' i ^ l) layout.left lines in
  layout.top ^
  String.concat ~sep:'\n' lines
  ^ layout.bottom

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 20, 2025

I think you rather want to mostly follow what commonmark does here slightly adapted for the fact that the content can be on the line delimiting the block {[. That would be:

  1. Strip leading and trailing whitespace from the contents of {[…]}.
  2. The first non whitespace character in {[…]} defines the left margin column m
  3. For each line, indent to column m, if there is a non-whitespace character before column m
  4. For each line strip whitespace < m.

@panglesd
Copy link
Collaborator Author

That would be a possibility too, though more breaking (although I'm not sure how many code blocks would change in practice).

That being said, I'm not sure how close this is to what commonmark does. From what I understand, commonmark only looks at the indentation of the opening fence (as mentioned in this example). In what you suggest, it's not the fence opening which determines the left margin, but first non-whitespace character, which feels quite different to me.

We could also use the fence to determine left-margin. But, in

{[
    bli
]}

the content would become indented.

I would be in favor of keeping the current box model. It's fewer changes, and only few people would relate the "common-mark like" proposal to their experience of commonmark. But I'm actually fine with both.

@panglesd
Copy link
Collaborator Author

Also, the box model allows to start with spaces (but not newlines), and so do commonmark. Using the first non-whitespace as left-margin does not allow that! (Although, I cannot give an example of a plausible scenario where we need to do that, I'm sure there is one!)

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 20, 2025

That being said, I'm not sure how close this is to what commonmark does. From what I understand, commonmark only looks at the indentation of the opening fence (as mentioned in this example). In what you suggest, it's not the fence opening which determines the left margin,

Yes it is different: there is never any content on the opening fence in commonmark so you can use the marker for defining the indentation level.

Using the first non-whitespace as left-margin does not allow that

You could escapes for that but I'm no longer sure how escapes work in ocamldoc it has always been a bit ill-defined an confusing.

That being said if your goal is this then you would be better off keeping a precise location of the start of content and simply keep the raw string of the content in the node.

@dbuenzli
Copy link
Contributor

keep the raw string of the content in the node.

(And not even, if you have precise start/end locations and the source you can simply extract the part)

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 20, 2025

I would be in favor of keeping the current box model.

Note that the problem with the current box model, is that it's not very clear. To this day I always write code fences fully flushed left to make sure I get what I want… This allows for a clear mental model in my opinion: namely something clear has to define the indentation level in the beginning of the code fence and then everything is simply left aligned on that indentation level with any whitespace before reaching the indentation level being dropped.

@NoahTheDuke
Copy link
Contributor

Ocaml's comments are naturally indented, which makes using Commonmark's style harder. It requires deleting indentation that an editor or formatter might try to add.

@jonludlam
Copy link
Member

(And not even, if you have precise start/end locations and the source you can simply extract the part)

Mdx does it this way - by getting the locations (which are indeed precise) and splicing the text from the original file.

@NoahTheDuke
Copy link
Contributor

To be clear, the only benefit to handling this is for display, right? Ocaml doesn't care about indentation, so Mdx and/or an odoc extract command would just extract the raw text and slap it into a file. Having the precise location of the text within the {[...]} tags is only meaningful for displaying in docs files or hovers.

@NoahTheDuke
Copy link
Contributor

NoahTheDuke commented Feb 20, 2025

I think the box model is the correct way to think about code blocks (and verbatim blocks, while we're at it).

ocaml/ocaml doesn't have a consistent way of indenting code blocks right now.

1) stdlib/float.mli puts the first line against the braces and then indents the following lines to the column where the braces sit (note the 2 space indent to everything, including the opening of the comment block):

  (** ...
      For example, consider the following program:
  {[let size = 100_000_000
  let a = Float.Array.make size 1.
  let update a f () =
     Float.Array.iteri (fun i x -> Float.Array.set a i (f x)) a
  let d1 = Domain.spawn (update a (fun x -> x +. 1.))
  let d2 = Domain.spawn (update a (fun x ->  2. *. x +. 1.))
  let () = Domain.join d1; Domain.join d2
  ]}
  ... *)

2) stdlib/condition.mli has a line break and then indents to the column one past the square brace:

(** ...
   {[
     Mutex.lock m;
     while not P do
       Mutex.unlock m; Mutex.lock m
     done;
     <update the data structure>;
     Mutex.unlock m
   ]}
... *)

3) stdlib/bytes.mli has a line break and no indentation:

(** ...
   {[
let string_init len f : string =
  let s = Bytes.create len in
  for i = 0 to len - 1 do Bytes.set s i (f i) done;
  Bytes.unsafe_to_string s
   ]}
... *)

4) Another part of stdlib/bytes.mli dedents the braces, has no spaces or line break, and no indentation of the code block:

(** ...
    For example, consider the following program:
{[let size = 100_000_000
let b = Bytes.make size  ' '
let update b f ()  =
  Bytes.iteri (fun i x -> Bytes.set b i (Char.chr (f (Char.code x)))) b
let d1 = Domain.spawn (update b (fun x -> x + 1))
let d2 = Domain.spawn (update b (fun x -> 2 * x + 1))
let () = Domain.join d1; Domain.join d2
]}
... *)

5) stdlib/arg.mli dedents the braces, and indents the code block two spaces past where the text above sits (an awkward 5 spaces):

(** ...
   This module provides a general mechanism for extracting options and
   arguments from the command line to the program. For example:

{[
     let usage_msg = "append [-verbose] <file1> [<file2>] ... -o <output>"
     let verbose = ref false
     let input_files = ref []
     let output_file = ref ""
... *)

6) stdlib/nativeint.mli has a line break and then indents the code block to align with the square brace instead of 1 past:

(** ...
    {[
     let zero: nativeint = 0n
     let one: nativeint = 1n
     let m_one: nativeint = -1n
    ]}
... *)

7) stdlib/queue.mli has a line break and then indents to align with the curly brace:

(** ...
    {[
    # let q = Queue.create ()
    val q : '_weak1 Queue.t = <abstr>
... *)

8) stdlib/dynarray.ml (not .mli) has no line break and indents to 1 past the square braces:

  (** ...
      We could instead provide
      [val find : ('a, 'stamp) with_dummy -> ('a, 'stamp dummy) result]
      but this would involve intermediary allocations.

      {[match find x with
        | None -> ...
        | Some v -> ...]}
  ... *)

9) Most cursed of all, stdlib/either.mli dedents the braces, has no linebreak and then indents past the code of the initial line as if it was not on the same line as the braces:

(** ...
    For example:

{[List.partition_map:
    ('a -> ('b, 'c) Either.t) -> 'a list -> 'b list * 'c list]}
... *)

(This is the only instance of this style in ocaml/ocaml, which means we can safely ignore it (or even change it directly to align with other styles).)

All of the code blocks except for 9) are consistent with the below algorithm.

  1. If there is any text on the line containing the opening braces, ignore it for now.
  2. Find the least indented line in the remaining lines, store the indent as the left edge of the box.
  3. Remove the left edge indentation of each remaining line.
  4. If there is any text on the line containing the opening braces, trim the whitespace and insert it at the front.

@jonludlam
Copy link
Member

Just thinking about the implementation, I was wondering about maybe having the raw unaltered text in the AST, then a separate function that applies the formatting rules being suggested in this thread. If we did it that way we'd avoid the need for the original source as currently required by mdx.

@NoahTheDuke
Copy link
Contributor

I took a shot at implementing my algorithm above which led to #1318. Can be helpful to compare the two sets of outputs.

@dbuenzli
Copy link
Contributor

dbuenzli commented Feb 20, 2025

Just thinking about the implementation, I was wondering about maybe having the raw unaltered text in the AST, then a separate function that applies the formatting rules being suggested in this thread.

That looks ok to me.

Ocaml's comments are naturally indented, which makes using Commonmark's style harder. It requires deleting indentation that an editor or formatter might try to add.

Nothing different here, CommonMark also does that. The only difference is that CommonMark's delimiters stand on their own line (and is thus easier to use as the indentation mark).

To be clear, the only benefit to handling this is for display, right?

No, you need precise locations and layout for compilation errors of the extracted source code to be reported at the right place in the original. You need a line directive followed by the exact, verbatim, code for compilation errors to show up exactly at the right place in the source.

All of the code blocks except for 9) are consistent with the below algorithm.

Unless I'm mistaken that is the case for my proposal too. Personally I don't see any benefit of using the line with the smallest indent as you suggest (rather than the first line):

  1. It makes it harder to spot the indentation point on larger code samples.
  2. It's more brittle to careless c&p and/or additions.

Besides the same problem as @panglesd mentioned remains: how do you solve the problem of enforcing an initial indent ? For the latter I have a few ideas but I don't really like them. One would be to simply use the non newline whitespace directly following {[ to define the indent if the line is blank (made of only whitespace). It is obscure but then we can say that it's likely only used for 0.01% of the cases.

@NoahTheDuke
Copy link
Contributor

The reason you use the lowest indent is that in example 9), if it had multiple lines, then you'd account for additional indentation on the second line:

(** ...
    For example:

{[List.partition_map:
    ('a -> ('b, 'c) Either.t) -> 'a list -> 'b list * 'c list

  List.partition_map:
    ('a -> ('b, 'c) Either.t) -> 'a list -> 'b list * 'c list]}
... *)

renders as

       [%expect {|
         ((output
           (((f.ml (2 4) (2 16))
             (paragraph
              (((f.ml (2 4) (2 7)) (word For)) ((f.ml (2 7) (2 8)) space)
               ((f.ml (2 8) (2 16)) (word example:)))))
            ((f.ml (4 0) (8 63))
             (code_block
              ((f.ml (4 2) (8 61))
                "List.partition_map:\
               \n  ('a -> ('b, 'c) Either.t) -> 'a list -> 'b list * 'c list\
               \n\
               \nList.partition_map:\
               \n  ('a -> ('b, 'c) Either.t) -> 'a list -> 'b list * 'c list")))))
          (warnings ())) |}]

In what scenarios do you want to enforce an initial indent in a code block?

@dbuenzli
Copy link
Contributor

The reason you use the lowest indent

From a usability point of view you turn the problem of understanding how stuff is going to be indented into checking n (i.e. every) lines rather than 1. This becomes particularly acute when your code blocks spans more than one screenful. So I really rather not have that solution.

In what scenarios do you want to enforce an initial indent in a code block?

Who knows ? There's always the odd case out there and it's enraging the day you need it. But a simple example is ASCII art.

{[
     +---+---+---
tape | A | A | …
     +---+---+--- 
]}

@NoahTheDuke
Copy link
Contributor

Your example would be correctly handled by finding the lowest indent (tape), and incorrectly handled by using the first line's indent ( +).

Using lowest indent:

     let stdlib_either =
       test "
{[
     +---+---+---
tape | A | A | …
     +---+---+--- 
]}
       ";
       [%expect {|
         ((output
           (((f.ml (2 0) (6 2))
             (code_block
              ((f.ml (2 2) (6 0))
                "     +---+---+---\
               \ntape | A | A | \226\128\166\
               \n     +---+---+--- ")))))
          (warnings ())) |}]

Using first line indent:

     let stdlib_either =
       test "
{[
     +---+---+---
tape | A | A | …
     +---+---+--- 
]}
       ";
       [%expect {|
         ((output
           (((f.ml (2 0) (6 2))
             (code_block
              ((f.ml (2 2) (6 0))
                "+---+---+---\
               \n| A | A | \226\128\166\
               \n+---+---+--- ")))))
          (warnings ())) |}]

@dbuenzli
Copy link
Contributor

Right, now suppose you want to write:

Here is the box:

{[
+---+
| A |
+---+
]}

After shifting one position to the right you get this configuration:

{[
    +---+
    | A |
    +---+
]}

@dbuenzli
Copy link
Contributor

We could also use the fence to determine left-margin. But, in

{[
    bli
]}

the content would become indented.

I think in the end that this is actually the more natural and less confusing overall (and allows to specify any initial indent you wish without fuss). There is the question of what happens when you have:

  {[bli
  …
  ]}

but you can simply say that this is equivalent to:

  {[
  bli
  …
  ]}

So basically the first column of {[ defines the left margin. Anything that is on the left of that margin is automatically indented to the margin. The rule is simple to understand and allows for any hand-made ASCII layout you wish.

@NoahTheDuke
Copy link
Contributor

That's pretty clever. Thanks for thinking it through with us.

@panglesd
Copy link
Collaborator Author

Thanks for all the ideas and discussion. Although this PR is "just" a fix for an already established syntax, and not an attempt to change it, it's good to have feedback on it, and we can maybe change it. I'll try to sum up the choice we have, but first:

Just thinking about the implementation, I was wondering about maybe having the raw unaltered text in the AST, then a separate function that applies the formatting rules being suggested in this thread. If we did it that way we'd avoid the need for the original source as currently required by mdx.

Mdx currently changes the code blocks. But you want it to change them while preserving the layout. For instance:

{[
              # let x = 1 ;;
]}
{[
# let x = 1 ;;
]}

should be rewritten to

{[
              # let x = 1 ;;
              val x : int
]}
{[
# let x = 1 ;;
val x : int
]}

so that eg it does not fight with ocamlformat which is responsible for the layout.

That is why we might need a bit more than the raw string, but a standardized way of going from/to raw content/deindented content.


I'll try to sum up the pros and cons of each possibility, at this point:

Here are the nice properties we would like to have:

  1. It is easy for the user to "intuitively" know what will be the content of the code block from the raw string
  2. It does not break too many existing documentation
  3. It allows to have an indentation in the first line
  4. Be fit for long code blocks:

    Be easy to spot the indentation point on larger code samples.
    Not be brittle to careless c&p and/or additions.

We have several candidates for that:
A. Using the smallest indent.

  1. I think that (with this PR fixing the bugs) it's as intuitive as the others
  2. It's just fixing bugs so should not break any documentation
  3. Does allow to have an indentation as the first line
  4. Not so fit for long code blocks as the indentation depend on it all

B. Using the first indent

  1. I think it's as intuitive as the others. Eg I think @NoahTheDuke got it wrong here: tape should be part of the output which would break the indentation
  2. Breaking impact has to be measured but not that much code starts with an indentation
  3. Does not allow to have an indentation as the first line which is a problem for ascii art-based languages ^^
  4. Fit for long code blocks

C. Using the opener's indent

  1. I think it's as intuitive as the others.
  2. Breaking impact has to be measured. At least ocamlformat's default formatting makes it non-breaking.
  3. Allow to have an indentation as the first line
  4. Fit for long code blocks

So I'm fine with A. or C. as long as we find out it's not going to break to many code blocks!

I'll try to investigate C's breaking.

@dbuenzli
Copy link
Contributor

Btw. I'm just curious are verbatim blocks treated differently from code blocks ? Because with the odoc3. I'm currently forced to write e.g.:

    {- [index_properties], the index properties. 0 if not indexed.
{v
7                               0
+---+---+---+---+---+---+---+---+
|   |   |   |   | L | S | T | F |
+---+---+---+---+---+---+---+---+
v}
    }

To make sure the verbatim block is not indented. If I write it:

    {- [index_properties], the index properties. 0 if not indexed.
       {v
       7                               0
       +---+---+---+---+---+---+---+---+
       |   |   |   |   | L | S | T | F |
       +---+---+---+---+---+---+---+---+
       v}
    }

the block is indented:
Screenshot 2025-02-24 at 12 32 07

@dbuenzli
Copy link
Contributor

So I'm fine with A. or C. as long as we find out it's not going to break to many code blocks!

Note that only C allows you to write this example. The problem is not about being able to indent on the first line, it's to be able to indent the whole content as you wish in the code block.

@jonludlam
Copy link
Member

That is why we might need a bit more than the raw string, but a standardized way of going from/to raw content/deindented content.

Yes, that's what I'm asking for.

As an example where we might want the raw text rather than the processed stuff, we could do a better job of error messages than mdx does today by inserting line directives, so that the line numbers matched the source file correctly. However, I don't think there's such a thing as 'column directives', so to get the column numbers to match the source file correctly you could insert each line in exactly from the source, including the indentation.

@panglesd
Copy link
Collaborator Author

panglesd commented Feb 24, 2025

Some more though about C:
If we use the first character ({) of the opening, we get that ocamlformat automatically indent code blocks, that is lots of code blocks are affected by this change:

(** I'm formatted by ocamlformat

    {[
      hello ocamlformat
    ]} *)

renders as hello ocamlformat

If we use the last character ([), then many things will get de-indented:

(** I'm formatted by ocamlformat

    {@ocaml[
      let x =
        match w with
        | AAAAAAAAAAAAAAAAAAAAAAAA -> 1
        | BBBBBBBBBBBBBBBBBBBBBBBBB -> 2
    ]} *)

will render as

let x =
match w with
| AAAAAAAAAAAAAAAAAAAAAAAA -> 1
| BBBBBBBBBBBBBBBBBBBBBBBBB -> 2

We could use as indent the position of { + 2 but well...

@dbuenzli
Copy link
Contributor

so to get the column numbers to match the source file correctly you could insert each line in exactly from the source, including the indentation.

Yes that's what you need if you want to use line directives.

we get that ocamlformat automatically indent code blocks

Well this could be changed in ocamlformat.

In any case I don't think it's a good idea to indent after {[, you are usually already quite nested and you still want to preserve horizontal space to write your code blocks.

@panglesd
Copy link
Collaborator Author

panglesd commented Feb 24, 2025

Well this could be changed in ocamlformat.

Yes, what I'm saying is more: there are currently many projects using ocamlformat, with automatically indented code blocks. Changing the indentation to be based on the opening { will "break" all of them. (with "breaking" meaning adding two spaces in front)

About horizontal space: I'm slightly concerned with

  Some
    reason
      to
        indent 
          {[
let x =
  1
          ]}

being rendered as

let x =
1

@panglesd
Copy link
Collaborator Author

Out of the example given by @NoahTheDuke at #1317 (comment), I think that examples 2, 3, 5 and 6 will get their rendering wrong:

  • 2 5 and 6 will be indented too much
  • 3 will be deindented

@dbuenzli
Copy link
Contributor

Note the whole code block semantics has already been through a few changes and it's perhaps a good idea to review these discussions to what got us to what we have now.

#39
#134
#133
#132

Personally I don't think the right decision was taken at the time (it seems they did not realize that you could not enforce arbitrary layouts) but perhaps it's not worth changing again. However I have to say, as far as I'm concerned, the only reliable way of working with code block and verbatims in ocamldoc has remained the "ugly workaround" mentioned here, that is always write all that fully flushed left; so the status quo may not be optimal either.

@panglesd
Copy link
Collaborator Author

panglesd commented Mar 3, 2025

This PR is ready for review:

  • Implements what has been decided here, with the slight changes that single line code blocks ({[ blibli ]}) are trimmed and accepted, as discussed with @jonludlam.
  • Raises warnings both on "too few indentation" and "multiline code blocks not starting with a new line".
  • Verbatims are treated the same way as code blocks. It was previously not specified how they are treated in the documentation, and it seems good to have them as code blocks. From what I can understand, everyone agrees with that in the discussion, but there was no discussion focused on that, so let me know if I'm wrong.
  • Tests are updated.
  • Docs are updated.

@dbuenzli
Copy link
Contributor

dbuenzli commented Mar 3, 2025

  • that single line code blocks ({[ blibli ]}) are trimmed and accepted,

Is it really worth making an exception here ? It makes the construct more complex to understand and perhaps that some explanations can be written more compactly that way.

For example:

In fact the we can simply replace the line 
  {[  let x = List.iter f v  ]}
of the above function by 
  {[  let y = Array.iter f v ]}
and get the result we wish.
  • and "multiline code blocks not starting with a new line".

Are you sure about that one ? In the past odoc started to warn on a few things that turned out to be rather annoying, I would rather be conservative and not do that.

@panglesd
Copy link
Collaborator Author

panglesd commented Mar 4, 2025

It makes the construct more complex to understand and perhaps that some explanations can be written more compactly that way.

I'm a bit confused! The first part of the sentence seems to say it's not worth it, and the second part seem to say that it is worth it 🫤
Treating single line code blocks as "exceptional" have three effects:

  • It allows to trim at the end of the string (in {[ hello ]} we probably don't want the ending space. Currently code blocks trim trailing lines but not trailing space on the last line, I kept that behavior).
  • We don't raise a warning.
  • We explain it a bit differently. I could not find a way to explain it that was easy to understand both for one-liners and multilines, without explaining those two cases separately. I think it is fine if "multiline content not starting by a new line" is harder to understand.

So, if we trim trailing spaces in code blocks, which maybe we should do, there is no distinction in how multiline/single lines content is handled. But I would still prefer to explain how they are handled "case by case".

In the past odoc started to warn on a few things that turned out to be rather annoying, I would rather be conservative and not do that.

I think hope maybe that was because odoc was warning about dependencies problems, and that with the new flags which allow to filter warnings the situation will be better, which will allow to fix the warnings as they come.

@dbuenzli
Copy link
Contributor

dbuenzli commented Mar 4, 2025

I'm a bit confused! The first part of the sentence seems to say it's not worth it, and the second part seem to say that it is worth it 🫤

No I think we misunderstood each other

You said:

  • Implements what has been decided here, with the slight changes that single line code blocks ({[ blibli ]}) are trimmed and accepted, as discussed with @jonludlam.

To me this means String.trim so also trimming the initial space which is what I would object (I care a bit less about the trailing thing if you want to make an exception here). If you keep the initial white that's fine with me.

But then I just find that simply saying that:

{|… 

is strictly equivalent to

{|
…

and then explain how:

{|
…
|}

works is the easiest. You no longer need a notion of single or multi line code block. There are just code blocks.

I think hope maybe that was because odoc was warning about dependencies problems, and that with the new flags which allow to filter warnings the situation will be better, which will allow to fix the warnings as they come.

Not at all. It was warnings about the way I wrote things in odoc that suddenly odoc had opinions about. It was quite annoying.

@panglesd
Copy link
Collaborator Author

panglesd commented Mar 4, 2025

No I think we misunderstood each other

Indeed!

What I had understood from this comment, was that:

  {[   text after space
  ...
  ]}

was considered as:

  {[
  text after space
  ...
  ]}

but in fact, I now think you meant that it is turned into:

  {[
     text after space
  ...
  ]}

So in what I understood version, the beginning was trimmed anyway, which explains why I was confused why you were dissatisfied with trimming the beginning of single line content.

Thinking about it, not trimming makes sense. I'll add a commit to this PR with this behavior.

@panglesd
Copy link
Collaborator Author

panglesd commented Mar 4, 2025

About the warning, you wrote:

However note that examples #1317 (comment) are actually easy to detect (non-blank content before the column of the marker {[) and for users to fix (simply move the {[ marker at the right place). So a good trade-off could be to tackle these cases with a new warning.

So in some case, it appears OK to raise a warning if odoc has a wrong opinion on the layout of code blocks! That being said, I'm ok to remove this warning.

@dbuenzli
Copy link
Contributor

dbuenzli commented Mar 4, 2025

So in some case, it appears OK to raise a warning if odoc has a wrong opinion on the layout of code blocks!

No that's a different warning. It's a warning to help people getting rid of a semantics of code blocks we'd like to get rid of.

I'm more talking about warnings/errors like this one which, while being well-meaning, end up being unhelpful for all sorts of reasons (there were other occurences but I fail to find them).

Note that if by:

Thinking about it, not trimming makes sense.

you meant no initial and trailing trimming that issue may actually be solved by this PR.

@panglesd
Copy link
Collaborator Author

panglesd commented Mar 4, 2025

In the last changes, I removed the trimming for more than a whitespace only line (as per this comment, which praises a behavior that was lost...)
I updated the docs.
I removed the warning on "multiline code blocks that do not start with a newline" 1

So now, the semantics of code block content is reasonably easy to explain, and it is possible to have code blocks with arbitrary whitespace in it.

I hope the PR is in a final state in term of design!

Footnotes

  1. although, I would classify that as a kind of comments we want to get rid of: either we lose the alignment between the first and the second line, as the first line is shifted by 2 ({[) or more ({delim@ocaml[) to the left; or we lose the ability to start the first line at the beginning of the line.

@jonludlam
Copy link
Member

I'd still like to have the raw untrimmed code block in the AST to avoid the need to keep a copy of the original source around for MDX-like uses. Or perhaps we should have that in another PR?

@panglesd
Copy link
Collaborator Author

panglesd commented Mar 4, 2025

I'd still like to have the raw untrimmed code block in the AST to avoid the need to keep a copy of the original source around for MDX-like uses. Or perhaps we should have that in another PR?

I started doing that in another branch!

Copy link
Member

@jonludlam jonludlam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some comments to change!

else
match s.[index] with
| ' ' | '\t' | '\r' -> handle_first_newline (index + 1)
(* Multiline starting with an empty line *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are a little confusing, and could do with rewording.

@jonludlam
Copy link
Member

Let's change the comments in your other PR based on this one, to avoid the rebase.

@jonludlam jonludlam merged commit 41d293f into ocaml:master Mar 5, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants