-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally compute lineOffsets, refs #67 #68
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your proposed API is not the right solution here, mainly for two reasons:
- It adds a new
options
argument forparseCST()
, which we've managed to avoid so far. - It requires knowing that you may want to access the line offsets when calling
parseCST()
, rather than optionally doing so later.
I don't have a complete solution envisioned here, but the user story that I'd like the line offset API to solve is something like, "I've parsed this string, and now I have this offset that I'd like to identify as a line/col position." How to do that efficiently is a further problem, and definitely of less importance than having a clean API. cst.lineOffsets
does sound like a relatively right sort of place for caching, but that doesn't mean that we need to populate the cache pre-emptively.
What do you think? Sorry to leave this response a bit incomplete; I'll need to return to this a bit later.
src/cst/parse.js
Outdated
return { line: undefined, col: undefined } | ||
const lineIndex = lineOffsets.indexOf(offset) | ||
if (lineIndex >= 0) | ||
return { line: lineIndex, col: offset - lineOffsets[lineIndex] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is special-case handling for when offset
points to a \n
character, yes? Why couldn't it be handled by the following for
loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have optimised pre-emptively here (which is a mistake without measuring it). It just avoids the search-in-Javascript and allows the optimised indexOf
method to have a chance. As the array is always sorted, perhaps a binary search-like approach would be better than this plus the loop, especially for large documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend first going with just a simple for loop, and then seeing if there's a real-world case that would show a need for a more complicated search pattern.
src/cst/parse.js
Outdated
src = src.replace(/\n/g, (match, offset) => { | ||
lf.push(offset + 1) | ||
return '\n' | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src.replace()
is far too heavy an operation for this; use src.indexOf('\n', fromIndex)
instead. There's no reason to mutate the string here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Will amend in the next revision.
src/index.js
Outdated
return parseCST(src).map(cstDoc => new Document(options).parse(cstDoc)) | ||
return parseCST(src, { | ||
computeLineOffsets: options ? options.computeLineOffsets : false | ||
}).map(cstDoc => new Document(options).parse(cstDoc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of the cst.map()
operation will not contain the non-iterable properties of cst
, which means that setting computeLineOffsets
will not influence the output of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will take another look.
src/cst/parse.js
Outdated
return documents | ||
} | ||
|
||
export function charPosToLineCol(offset, lineOffsets) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason why this function is being exported from cst/parse.js
, rather than its own file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just was unsure of your convention on this.
I've got no problems with computing the Would that (plus the above changes) be enough for me to do the next revision of this PR? |
I'm fine with |
The likely (initial) recommendation I think would be to reparse the document(s) at the CST level. I don't know how acceptable you find that... We might be able to do something for the AST where I plan to follow this up with another PR (or a plugin, or a separate module) which can decorate the CST with JSON pointers allowing a lookup from the JS object to the CST, and then to the line/column position. JSON pointers being widely used in the JSON-Schema and OpenAPI worlds. Just to make you aware, the other PR/plugin/module I have in mind is the optional ability to be able to preserve comments when |
If there's a good reason to extend the API, then we can do that -- but it does require a good reason. If your application is usually interested in the CST, then it makes sense to explicitly build the The main use case that I'd expect for line/col references outside of those cases would be errors, when such are not expected. It would be useful, when encountering one, to be able to use the error's context to build a line/col reference to the source. Then, if the offsets are cached as I've never really worked with JSON pointers, but I'd suspect that it'd be relatively simple to write a getter method for the AST Document that would help with that. Maybe something like Regarding comments, what use case did you have in mind for the object with comments? My suspicion is that once you account for all the comments that YAML allows for, you'd end up with something very similar to the existing Document object. |
Simply for passing into existing code which can only handle native JS objects in a fixed representation, and receiving it back (possibly mutated) and then writing it back out as YAML. For example, a routine which converts an OpenAPI 2.0 document to OpenAPI 3.x. The comments could be converted into |
@MikeRalphson I updated your branch with the changes I'd requested, so that this can be merged and released along with other recent changes. Essentially, the public API is now a new getter The For errors, the line positions are now available from |
Just as a heads-up, I realised after releasing 1.3.0 that zero-indexing the line/col positions was surprising, so pushed out a patch release 1.3.1 that fixes our behaviour to follow the norm of one-indexing these positions. |
As discussed in #67 this PR adds the optional ability to compute line offsets and provides a helper function
charPosToLineCol
.Both the AST and CST parse(Document) functions can now accept a
computeLineOffsets
option. Tests added to ensure default behaviour is unchanged and for the new functionality.README.md
has been minimally updated, but not yet the docs. I'd appreciate some guidance on doing that.