-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathREADME.Rmd
180 lines (139 loc) · 4.79 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
```{r, setup, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(
comment = "#>",
tidy = FALSE,
error = FALSE,
fig.width = 8,
fig.height = 8)
```
# xmlparsedata
> Parse Data of R Code as an 'XML' Tree
[](https://travis-ci.org/MangoTheCat/xmlparsedata)
[](https://ci.appveyor.com/project/gaborcsardi/xmlparsedata)
[](http://www.r-pkg.org/pkg/xmlparsedata)
[](http://www.r-pkg.org/pkg/xmlparsedata)
[](https://codecov.io/github/MangoTheCat/xmlparsedata?branch=master)
Convert the output of 'utils::getParseData()' to an 'XML' tree, that is
searchable and easier to manipulate in general.
---
- [Installation](#installation)
- [Usage](#usage)
- [Introduction](#introduction)
- [`utils::getParseData()`](#utilsgetparsedata)
- [`xml_parse_data()`](#xml_parse_data)
- [Renaming some tokens](#renaming-some-tokens)
- [Search the parse tree with `xml2`](#search-the-parse-tree-with-xml2)
- [License](#license)
## Installation
```{r eval = FALSE}
source("https://install-github.me/MangoTheCat/xmlparsedata")
```
## Usage
### Introduction
In recent R versions the parser can attach source code location
information to the parsed expressions. This information is often
useful for static analysis, e.g. code linting. It can be accessed
via the `utils::getParseData()` function.
`xmlparsedata` converts this information to an XML tree.
The R parser's token names are preserved in the XML as much as
possible, but some of them are not valid XML tag names, so they are
renamed, see below.
### `utils::getParseData()`
`utils::getParseData()` summarizes the parse information in a data
frame. The data frame has one row per expression tree node, and each
node points to its parent. Here is a small example:
```{r}
p <- parse(
text = "function(a = 1, b = 2) { \n a + b\n}\n",
keep.source = TRUE
)
getParseData(p)
```
### `xml_parse_data()`
`xmlparsedata::xml_parse_data()` converts the parse information to
an XML document. It works similarly to `getParseData()`. Specify the
`pretty = TRUE` option to pretty-indent the XML output. Note that this
has a small overhead, so if you are parsing large files, I suggest you
omit it.
```{r}
library(xmlparsedata)
xml <- xml_parse_data(p, pretty = TRUE)
cat(xml)
```
The top XML tag is `<exprlist>`, which is a list of
expressions, each expression is an `<expr>` tag. Each tag
has attributes that define the location: `line1`, `col1`,
`line2`, `col2`. These are from the `getParseData()`
data frame column names.
### Renaming some tokens
The R parser's token names are preserved in the XML as much as
possible, but some of them are not valid XML tag names, so they are
renamed, see the `xml_parse_token_map` vector for the mapping:
```{r}
xml_parse_token_map
```
### Search the parse tree with `xml2`
The `xml2` package can search XML documents using
[XPath](https://en.wikipedia.org/wiki/XPath) expressions. This is often
useful to search for specific code patterns.
As an example we search a source file from base R for `1:nrow(<expr>)`
expressions, which are usually unsafe, as `nrow()` might be zero,
and then the expression is equivalent to `1:0`, i.e. `c(1, 0)`, which
is usually not the intended behavior.
We load and parse the file directly from the the R source code mirror
at https://github.com/wch/r-source:
```{r}
url <- paste0(
"https://raw.githubusercontent.com/wch/r-source/",
"4fc93819fc7401b8695ce57a948fe163d4188f47/src/library/tools/R/xgettext.R"
)
src <- readLines(url)
p <- parse(text = src, keep.source = TRUE)
```
and we convert it to an XML tree:
```{r}
library(xml2)
xml <- read_xml(xml_parse_data(p))
```
The `1:nrow(<expr>)` expression corresponds to the following
tree in R:
```
<expr>
+-- <expr>
+-- NUM_CONST: 1
+-- ':'
+-- <expr>
+-- <expr>
+-- SYMBOL_FUNCTION_CALL nrow
+-- '('
+-- <expr>
+-- ')'
```
```{r}
bad <- xml_parse_data(
parse(text = "1:nrow(expr)", keep.source = TRUE),
pretty = TRUE
)
cat(bad)
```
This translates to the following XPath expression (ignoring
the last tree tokens from the `length(expr)` expressions):
```{r}
xp <- paste0(
"//expr",
"[expr[NUM_CONST[text()='1']]]",
"[OP-COLON]",
"[expr[expr[SYMBOL_FUNCTION_CALL[text()='nrow']]]]"
)
```
We can search for this subtree with `xml2::xml_find_all()`:
```{r}
bad_nrow <- xml_find_all(xml, xp)
bad_nrow
```
There is only one hit, in line 334:
```{r}
cbind(332:336, src[332:336])
```
## License
MIT © Mango Solutions