-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Add way to put directions for bindgen in C headers being converted so more rust code can be generated. #3132
Comments
You are correct that parts of the process could be further automated by tooling, possibly aiding it with extra information on the C header file. In Rust for Linux we welcome progress towards that and, in fact, However, most of the work is not really on the parts that could easily be automated further (even assuming extra annotations on the C header), but rather on the rest of the code of the abstractions which (so far) require humans to be designed, reviewed and maintained over time. For instance, consider a data structure API. The C header may provide a set of types and functions and so on. The Rust API, however, may need to be structured quite differently, in order to provide a safe API that wraps the C one. Sometimes, even, the chance to write these abstractions was also seen as a chance to improve on the design of the C API anyway. |
There is something doing rust coding you are missing. If you are needing alter structure the C API to make them safe with rust manually there is most likely missing information in C code that should be there so the C code can be correctly run over static analyzer to find defects. Lot of high end C/C++ static analyzers have means to add extra to the C/C++ code to make the results more correct. Sparse with the Linux kernel has added annotations in the code so it can detect bad particular behaviors but this could be expanded on.. Yes I know what you are doing is rust for linux. But lets be real Linux C kernel code is going to remain around for a very long time so this need audit quality improvements as well. Yes possible two birds one stone. Add extra information into the Linux C so rust bindings can generate well and this extra information then use by sparse or the like to find places where Linux kernel C code is doing unsafe things that it should not be doing. This is the bit I am not sure on is exactly what need to be annotations. Majority of things rust forbid programmers from doing in safe rust code you should not be doing in Linux C kernel code if you want to avoid crashing/kernel panic the system(yes I have first hand experience doing most of them). Issue on the C side it having tooling at this stage to fully detect those things. Basically I see this as absolute need to meet in the middle between the rust and C maintainers of the Linux kernel. Yes it will require people making rust abstraction for the Linux kernel to be writing some C kernel code patches to add the missing information need so rust abstraction generation can work and possible making a C audit tool demo the worth to the C maintainers of the additions. So those attempting to make rust abstractions are helping the C developers to have better code so the C Linux maintainers have less reason to say no to changes because the changes solve some of their problems. Also should reduce the manual code maintenance on rust maintainers so lower risk of burn out due to over work taking care of code and lower risk of fights with C maintainers causing burn outs because C maintainers should be less likely to question if adding rust is beneficial or not. |
We are well aware there is a lot of missing information in the C headers and, as I mentioned in my previous reply, it would be great to add more information to those. We would also support efforts in the C standard (or C implementations) to make C safer or to make interop with Rust better. However, automating the entire process to the point that safe Rust abstractions do not need to be designed is an open problem in the general case. But if you find a way to do so, please publish it! It would be extremely useful and it would have consequences beyond Linux. |
Yes to be able to add that information people have to know what you as rust developer want. As I said I don't know rust well. I can see by what being done that information is missing but there is no list detailing what that information in fact is. There is no wishlist for what for rust conversion you would want in the C header file if you could have it.
This a failure to understand C standard process. Because you are not doing this. There is no list for clang/gcc or others to know exactly what you want for bindgen. C standard process requires before features get added to the standard there is a implementation in something. To be able to add to the c standard the feature has to be in C compiler or C static analyzer or C conversion tool. Rust bindgen is a C conversion tool. C is not central commit added, https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html No where in the bindgen documentation can I find bindgen macro for its own extensions to C to be hidden from compilers that don't support them. Yes BINDGEN or RUST_BINDGEN could be valid of course this requires bindgen project to choose what it C Macro identifier to be there when bindgen is being run. Yes this normally contains the compiler/tool version number.. Yes it absolutely normal for any tool processing C to add its own C macros to allow custom information just for it in the C code this is standard C code. This is something I first looked for was what was the rust bindgen C macros to identify itself to the C code and there are none.
This is more that I am seeing the problem and the foundation work is missing that I cannot fully do because I really don't even have enough rust knowledge to know what C macros should be made to identify bindgen being used.. Big issues.
C standard allows custom extensions and you are not using this allowance. Yes I do see this as a general problem to rust as well. But I see this problem coming from not providing the information for what you need and not taking advantage of what C language in fact offers. C language lack of total central control does have some serous advantages that bindgen is not using. Yes my first post contains a very rough example of what the solution could look like. Yes putting non C code inside C for generation reasons has been done before as well. Of course you might decide to work directly with sparse or some other tool to extend or use their annotation that bindgen will support reading. Of course again to pick out those to suggest I would need a list of what you would need for ideal generation. |
There is: Rust-for-Linux/linux#353. (If you mean items that would require tweaking the C headers, such as a
I was the Spanish representative in WG14 for a few years...
If you want to help, then please see the link above and see if you can tackle some of those items. Those should be technically doable, i.e. they are not open problems. Another approach is looking at the safe Rust abstractions we already have upstream, and trying to identify places that could be automated, like you did in the first message. That is also great. For this, I would suggest, before writing To be clear, the only thing I was trying to say in my previous messages is that generating the complete safe Rust abstractions is an open problem, and it is harder than you probably realize. Please see also: https://docs.kernel.org/rust/general-information.html#abstractions-vs-bindings. |
Depending on how often you find particular patterns, it seems to me a potentially useful thing that could be worked on is some sort of custom attribute support for bindgen. Bindgen already supports metadata in the terrible way of XML annotations (e.g., you can document a type with A potentially nicer approach for some of this could be to use something like But if something like the above could be useful to automate, let's say, 15% of the manual APIs that the kernel folks write, it might be worth doing? |
Definitely, anything that can be automated is welcome. Especially things that contain a lot of repetition, like the one in the OP, are prime candidates. XML in comments would probably be not liked by C kernel maintainers, but being able to write e.g. Custom attribute support with callbacks would be definitely useful and is something we also wondered about in the past, e.g. perhaps we would do our own Custom attribute support without callbacks that saves custom information on the Rust generated bindings could potentially be interesting -- one could then invoke a Rust macro on that, which could match differently depending on the saved attributes. In a way, it is like a callback but executed by a Rust macro. In some cases no custom information may be needed, i.e. just the ability to execute a macro on that set of C items. But this may be all too fancy, just to avoid callbacks. One common issue, though, are docs -- we typically want to have nice Rust docs if possible, and in some cases we may need to rewrite to some degree or update the C side, which isn't great. For instance, it may not be easy to convert to Markdown, or we may want to have extra docs in certain items, or have Rust examples... |
I had not found that abstractions vs bindings sorry to say in my eyes that document is goof up central. That document is most likely cause of some of the problem. Where are you generated abstractions in that abstractions vs bindings. Take the one I pointed to. That should be a generated abstraction from the C header file so that the abstractions contents stays synced with the C header file. That abstractions-vs-bindings write. has only unsafe code coming out of processing the C header file. Reality picking up list of #define in the C file and exposing them as a safe abstraction by code generation should absolutely be possible. This would stop the Linux kernel C maintainer looking at this going you are duplicating my interface for no good reason with means for duplicate copy to fall out of sync and wanting to NAK it to avoid future issues..
I would not be 100 percent sure of the XML in comments being rejected. Generated code in the past in the Linux kernel have include XML in comments. As long as the comments don't start with . Something like __opaque would be like a sparse with the Linux kernel uses. Lot of those attribute(()) with macros over with the Linux kernel are being picked up by sparse/smatch and used for Static code analysis.
https://sparse.docs.kernel.org/en/latest/annotations.html https://github.com/torvalds/linux/blob/master/include/linux/compiler_types.h Yes complier_types.h in the Linux kernel would have to grow a bindgen section to add the __opaque so that its only in the code as attribute((...)) when bindgen is operating. One of the first things required is choosing bindgen c macro identifier before you can add anything.. Yes a complier could have its own attribute((safe)) that means something completely different. Another consideration with custom attribute is this something sparse could process on C to locate defects in the Linux kernel C. If so Linux kernel maintainer will have very limited grounds to not to allow the addition to the source. Ok looking at that complier_types.h I see bindgen does have a c macro identifier "BINDGEN" So it is possible todo.
Yes complier_types.h in the Linux kernel would be where you would be declaring these operations. This is what I am talking about working out what should in the C so you don't have keep multi files synced with each other. Instead look at the C file there is the complete story. Of course there is the possibility that linux kernel already has some things marked that would be useful to rust generation and bindgen is not seeing them because you don't have #ifdef BINDGEN turn them on and lack method to process the attribute information. |
Looking at this patch that Christoph Hellwig NAKed
https://lore.kernel.org/linux-kernel/[email protected]/
and comparing to
https://github.com/torvalds/linux/blob/master/include/linux/dma-mapping.h
Maybe ask question why is there so much duplication. Then going hang on there is not enough information in the C header file. Then also remember sparse and other tools like it add bits to C code and Headers so they can function. So why cannot rust bindgen do the same.
The section of code that drew my attention in the NAK submit is this.
dma .rs
In my eyes this could be generated from the following from following section out of dma-mapping.h
Yes I don't know enough Rust or rust bindgen to have to this 100 percent correct.
Something like the following inserted into dma-mapping.h just before the section I quoted out of dma-mapping.h
And this just after the section I quoted out of dma-mapping.h.
Yes so that BINDGEN can get the information to generate what was in dma.rs so avoiding the sync issue.
I see this is the big source of friction in the Linux kernel. Not being able to put directions in the C header file means developers and up duplicating code/comments so causing a long term code maintenance problems by creating more areas in a code base to go out of sync that then results in the code not building so causing maintainers issues.
I also wonder how many hours are the rust Linux kernel developers putting into maintaining these abstractions every time something changes that should have been generated code that would have updated without human labor.
The text was updated successfully, but these errors were encountered: