Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Add way to put directions for bindgen in C headers being converted so more rust code can be generated. #3132

Open
oiaohm opened this issue Feb 15, 2025 · 8 comments
Labels
rust-for-linux Issues relevant to the Rust for Linux project

Comments

@oiaohm
Copy link

oiaohm commented Feb 15, 2025

Looking at this patch that Christoph Hellwig NAKed

https://lore.kernel.org/linux-kernel/[email protected]/

and comparing to
https://github.com/torvalds/linux/blob/master/include/linux/dma-mapping.h

Maybe ask question why is there so much duplication. Then going hang on there is not enough information in the C header file. Then also remember sparse and other tools like it add bits to C code and Headers so they can function. So why cannot rust bindgen do the same.

The section of code that drew my attention in the NAK submit is this.
dma .rs

+pub mod attrs {
+    use super::Attrs;
+
+    /// Specifies that reads and writes to the mapping may be weakly ordered, that is that reads
+    /// and writes may pass each other.
+    pub const DMA_ATTR_WEAK_ORDERING: Attrs = Attrs(bindings::DMA_ATTR_WEAK_ORDERING);
+
+    /// Specifies that writes to the mapping may be buffered to improve performance.
+    pub const DMA_ATTR_WRITE_COMBINE: Attrs = Attrs(bindings::DMA_ATTR_WRITE_COMBINE);
+
+    /// Lets the platform to avoid creating a kernel virtual mapping for the allocated buffer.
+    pub const DMA_ATTR_NO_KERNEL_MAPPING: Attrs = Attrs(bindings::DMA_ATTR_NO_KERNEL_MAPPING);
+
+    /// Allows platform code to skip synchronization of the CPU cache for the given buffer assuming
+    /// that it has been already transferred to 'device' domain.
+    pub const DMA_ATTR_SKIP_CPU_SYNC: Attrs = Attrs(bindings::DMA_ATTR_SKIP_CPU_SYNC);
+
+    /// Forces contiguous allocation of the buffer in physical memory.
+    pub const DMA_ATTR_FORCE_CONTIGUOUS: Attrs = Attrs(bindings::DMA_ATTR_FORCE_CONTIGUOUS);
+
+    /// This is a hint to the DMA-mapping subsystem that it's probably not worth the time to try
+    /// to allocate memory to in a way that gives better TLB efficiency.
+    pub const DMA_ATTR_ALLOC_SINGLE_PAGES: Attrs = Attrs(bindings::DMA_ATTR_ALLOC_SINGLE_PAGES);
+
+    /// This tells the DMA-mapping subsystem to suppress allocation failure reports (similarly to
+    /// __GFP_NOWARN).
+    pub const DMA_ATTR_NO_WARN: Attrs = Attrs(bindings::DMA_ATTR_NO_WARN);
+
+    /// Used to indicate that the buffer is fully accessible at an elevated privilege level (and
+    /// ideally inaccessible or at least read-only at lesser-privileged levels).
+    pub const DMA_ATTR_PRIVILEGED: Attrs = Attrs(bindings::DMA_ATTR_PRIVILEGED);
+}

In my eyes this could be generated from the following from following section out of dma-mapping.h

/**
 * List of possible attributes associated with a DMA mapping. The semantics
 * of each attribute should be defined in Documentation/core-api/dma-attributes.rst.
 */

/*
 * DMA_ATTR_WEAK_ORDERING: Specifies that reads and writes to the mapping
 * may be weakly ordered, that is that reads and writes may pass each other.
 */
#define DMA_ATTR_WEAK_ORDERING		(1UL << 1)
/*
 * DMA_ATTR_WRITE_COMBINE: Specifies that writes to the mapping may be
 * buffered to improve performance.
 */
#define DMA_ATTR_WRITE_COMBINE		(1UL << 2)
/*
 * DMA_ATTR_NO_KERNEL_MAPPING: Lets the platform to avoid creating a kernel
 * virtual mapping for the allocated buffer.
 */
#define DMA_ATTR_NO_KERNEL_MAPPING	(1UL << 4)
/*
 * DMA_ATTR_SKIP_CPU_SYNC: Allows platform code to skip synchronization of
 * the CPU cache for the given buffer assuming that it has been already
 * transferred to 'device' domain.
 */
#define DMA_ATTR_SKIP_CPU_SYNC		(1UL << 5)
/*
 * DMA_ATTR_FORCE_CONTIGUOUS: Forces contiguous allocation of the buffer
 * in physical memory.
 */
#define DMA_ATTR_FORCE_CONTIGUOUS	(1UL << 6)
/*
 * DMA_ATTR_ALLOC_SINGLE_PAGES: This is a hint to the DMA-mapping subsystem
 * that it's probably not worth the time to try to allocate memory to in a way
 * that gives better TLB efficiency.
 */
#define DMA_ATTR_ALLOC_SINGLE_PAGES	(1UL << 7)
/*
 * DMA_ATTR_NO_WARN: This tells the DMA-mapping subsystem to suppress
 * allocation failure reports (similarly to __GFP_NOWARN).
 */
#define DMA_ATTR_NO_WARN	(1UL << 8)

/*
 * DMA_ATTR_PRIVILEGED: used to indicate that the buffer is fully
 * accessible at an elevated privilege level (and ideally inaccessible or
 * at least read-only at lesser-privileged levels).
 */
#define DMA_ATTR_PRIVILEGED		(1UL << 9)

Yes I don't know enough Rust or rust bindgen to have to this 100 percent correct.

Something like the following inserted into dma-mapping.h just before the section I quoted out of dma-mapping.h


#ifdef RUST_BINDGEN
   pub mod attrs {
       use super::Attrs;
      /* possible some transformation direction*./
#end


And this just after the section I quoted out of dma-mapping.h.

#ifdef RUST_BINDGEN
}
#end

Yes so that BINDGEN can get the information to generate what was in dma.rs so avoiding the sync issue.

I see this is the big source of friction in the Linux kernel. Not being able to put directions in the C header file means developers and up duplicating code/comments so causing a long term code maintenance problems by creating more areas in a code base to go out of sync that then results in the code not building so causing maintainers issues.

I also wonder how many hours are the rust Linux kernel developers putting into maintaining these abstractions every time something changes that should have been generated code that would have updated without human labor.

@ojeda
Copy link
Contributor

ojeda commented Feb 15, 2025

You are correct that parts of the process could be further automated by tooling, possibly aiding it with extra information on the C header file. In Rust for Linux we welcome progress towards that and, in fact, bindgen has added features we requested in the past to help on that.

However, most of the work is not really on the parts that could easily be automated further (even assuming extra annotations on the C header), but rather on the rest of the code of the abstractions which (so far) require humans to be designed, reviewed and maintained over time.

For instance, consider a data structure API. The C header may provide a set of types and functions and so on. The Rust API, however, may need to be structured quite differently, in order to provide a safe API that wraps the C one. Sometimes, even, the chance to write these abstractions was also seen as a chance to improve on the design of the C API anyway.

@ojeda ojeda added the rust-for-linux Issues relevant to the Rust for Linux project label Feb 15, 2025
@oiaohm
Copy link
Author

oiaohm commented Feb 16, 2025

For instance, consider a data structure API. The C header may provide a set of types and functions and so on. The Rust API, however, may need to be structured quite differently, in order to provide a safe API that wraps the C one. Sometimes, even, the chance to write these abstractions was also seen as a chance to improve on the design of the C API anyway.

There is something doing rust coding you are missing. If you are needing alter structure the C API to make them safe with rust manually there is most likely missing information in C code that should be there so the C code can be correctly run over static analyzer to find defects. Lot of high end C/C++ static analyzers have means to add extra to the C/C++ code to make the results more correct.

Sparse with the Linux kernel has added annotations in the code so it can detect bad particular behaviors but this could be expanded on..

Yes I know what you are doing is rust for linux. But lets be real Linux C kernel code is going to remain around for a very long time so this need audit quality improvements as well. Yes possible two birds one stone. Add extra information into the Linux C so rust bindings can generate well and this extra information then use by sparse or the like to find places where Linux kernel C code is doing unsafe things that it should not be doing.

This is the bit I am not sure on is exactly what need to be annotations.

Majority of things rust forbid programmers from doing in safe rust code you should not be doing in Linux C kernel code if you want to avoid crashing/kernel panic the system(yes I have first hand experience doing most of them). Issue on the C side it having tooling at this stage to fully detect those things.

Basically I see this as absolute need to meet in the middle between the rust and C maintainers of the Linux kernel. Yes it will require people making rust abstraction for the Linux kernel to be writing some C kernel code patches to add the missing information need so rust abstraction generation can work and possible making a C audit tool demo the worth to the C maintainers of the additions. So those attempting to make rust abstractions are helping the C developers to have better code so the C Linux maintainers have less reason to say no to changes because the changes solve some of their problems.

Also should reduce the manual code maintenance on rust maintainers so lower risk of burn out due to over work taking care of code and lower risk of fights with C maintainers causing burn outs because C maintainers should be less likely to question if adding rust is beneficial or not.

@ojeda
Copy link
Contributor

ojeda commented Feb 16, 2025

We are well aware there is a lot of missing information in the C headers and, as I mentioned in my previous reply, it would be great to add more information to those. We would also support efforts in the C standard (or C implementations) to make C safer or to make interop with Rust better.

However, automating the entire process to the point that safe Rust abstractions do not need to be designed is an open problem in the general case. But if you find a way to do so, please publish it! It would be extremely useful and it would have consequences beyond Linux.

@oiaohm
Copy link
Author

oiaohm commented Feb 17, 2025

We are well aware there is a lot of missing information in the C headers and, as I mentioned in my previous reply, it would be great to add more information to those.

Yes to be able to add that information people have to know what you as rust developer want. As I said I don't know rust well. I can see by what being done that information is missing but there is no list detailing what that information in fact is. There is no wishlist for what for rust conversion you would want in the C header file if you could have it.

We would also support efforts in the C standard (or C implementations) to make C safer or to make interop with Rust better.

This a failure to understand C standard process. Because you are not doing this. There is no list for clang/gcc or others to know exactly what you want for bindgen.

C standard process requires before features get added to the standard there is a implementation in something.

To be able to add to the c standard the feature has to be in C compiler or C static analyzer or C conversion tool. Rust bindgen is a C conversion tool. C is not central commit added,

https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
Yes is perfectly valid under C standard to add your own extensions. Yes when these extensions prove their worth then they get added to the standard.

No where in the bindgen documentation can I find bindgen macro for its own extensions to C to be hidden from compilers that don't support them. Yes BINDGEN or RUST_BINDGEN could be valid of course this requires bindgen project to choose what it C Macro identifier to be there when bindgen is being run. Yes this normally contains the compiler/tool version number..
https://documentation.blackduck.com/bundle/coverity-docs/page/coverity-analysis/topics/using_predefined_macros_for_coverity_analysis-specific_compilations.html

Yes it absolutely normal for any tool processing C to add its own C macros to allow custom information just for it in the C code this is standard C code.

This is something I first looked for was what was the rust bindgen C macros to identify itself to the C code and there are none.

However, automating the entire process to the point that safe Rust abstractions do not need to be designed is an open problem in the general case. But if you find a way to do so, please publish it! It would be extremely useful and it would have consequences beyond Linux.

This is more that I am seeing the problem and the foundation work is missing that I cannot fully do because I really don't even have enough rust knowledge to know what C macros should be made to identify bindgen being used..

Big issues.

  1. No wish list detailing what data to generate rust bindings you would like to have C to contain.
  2. No Bindgen particular C extension support this include no C macro it identify itself as the starting point.

C standard allows custom extensions and you are not using this allowance.

Yes I do see this as a general problem to rust as well. But I see this problem coming from not providing the information for what you need and not taking advantage of what C language in fact offers. C language lack of total central control does have some serous advantages that bindgen is not using.

Yes my first post contains a very rough example of what the solution could look like. Yes putting non C code inside C for generation reasons has been done before as well.

Of course you might decide to work directly with sparse or some other tool to extend or use their annotation that bindgen will support reading. Of course again to pick out those to suggest I would need a list of what you would need for ideal generation.

@ojeda
Copy link
Contributor

ojeda commented Feb 17, 2025

There is no wishlist for what for rust conversion you would want in the C header file if you could have it.

There is: Rust-for-Linux/linux#353.

(If you mean items that would require tweaking the C headers, such as a safe function marker, then we could add there some of the ideas we discussed in the past. However, those require discussion first, including with C maintainers, i.e. it is not a guarantee that we could use any particular one.)

This a failure to understand C standard process.

I was the Spanish representative in WG14 for a few years...

I cannot fully do because I really don't even have enough rust knowledge

If you want to help, then please see the link above and see if you can tackle some of those items. Those should be technically doable, i.e. they are not open problems.

Another approach is looking at the safe Rust abstractions we already have upstream, and trying to identify places that could be automated, like you did in the first message. That is also great. For this, I would suggest, before writing bindgen code, discussing each pattern you identify (i.e. that you think it could be automated) in our Zulip, since there have been previous discussions about some patterns already, and there is more people that can help/collaborate on this.

To be clear, the only thing I was trying to say in my previous messages is that generating the complete safe Rust abstractions is an open problem, and it is harder than you probably realize. Please see also: https://docs.kernel.org/rust/general-information.html#abstractions-vs-bindings.

@emilio
Copy link
Contributor

emilio commented Feb 17, 2025

Depending on how often you find particular patterns, it seems to me a potentially useful thing that could be worked on is some sort of custom attribute support for bindgen.

Bindgen already supports metadata in the terrible way of XML annotations (e.g., you can document a type with /// <div rustbindgen replaces="..."> and that instructs bindgen to do something).

A potentially nicer approach for some of this could be to use something like __attribute__((annotate("..."))) and let bindgen propagate that to via a callback, which you could use to generate stuff like the code in the OP. While I think that could be useful for some cases, I agree with @ojeda that doing it generally seems quite a feat / would need a lot of project-specific information.

But if something like the above could be useful to automate, let's say, 15% of the manual APIs that the kernel folks write, it might be worth doing?

@ojeda
Copy link
Contributor

ojeda commented Feb 18, 2025

Definitely, anything that can be automated is welcome. Especially things that contain a lot of repetition, like the one in the OP, are prime candidates.

XML in comments would probably be not liked by C kernel maintainers, but being able to write e.g. __opaque (which is how normally attributes are done in the kernel, i.e. with a macro that expands to __attribute__((...))) would be great and would allow us to keep the information closer to the C items, rather than in a file. I think that may be acceptable for C kernel maintainers.

Custom attribute support with callbacks would be definitely useful and is something we also wondered about in the past, e.g. perhaps we would do our own __safe ([[safe]]), or perhaps annotate certain enums with specific details etc. We would need to move away from the CLI, though, but we may want to do that eventually anyway.

Custom attribute support without callbacks that saves custom information on the Rust generated bindings could potentially be interesting -- one could then invoke a Rust macro on that, which could match differently depending on the saved attributes. In a way, it is like a callback but executed by a Rust macro. In some cases no custom information may be needed, i.e. just the ability to execute a macro on that set of C items. But this may be all too fancy, just to avoid callbacks.

One common issue, though, are docs -- we typically want to have nice Rust docs if possible, and in some cases we may need to rewrite to some degree or update the C side, which isn't great. For instance, it may not be easy to convert to Markdown, or we may want to have extra docs in certain items, or have Rust examples...

@oiaohm
Copy link
Author

oiaohm commented Feb 18, 2025

To be clear, the only thing I was trying to say in my previous messages is that generating the complete safe Rust abstractions is an open problem, and it is harder than you probably realize. Please see also: https://docs.kernel.org/rust/general-information.html#abstractions-vs-bindings.

I had not found that abstractions vs bindings sorry to say in my eyes that document is goof up central. That document is most likely cause of some of the problem.

Where are you generated abstractions in that abstractions vs bindings. Take the one I pointed to. That should be a generated abstraction from the C header file so that the abstractions contents stays synced with the C header file.

That abstractions-vs-bindings write. has only unsafe code coming out of processing the C header file. Reality picking up list of #define in the C file and exposing them as a safe abstraction by code generation should absolutely be possible.

This would stop the Linux kernel C maintainer looking at this going you are duplicating my interface for no good reason with means for duplicate copy to fall out of sync and wanting to NAK it to avoid future issues..

XML in comments would probably be not liked by C kernel maintainers, but being able to write e.g. __opaque (which is how normally attributes are done in the kernel, i.e. with a macro that expands to __attribute__((...))) would be great and would allow us to keep the information closer to the C items, rather than in a file. I think that may be acceptable for C kernel maintainers.

I would not be 100 percent sure of the XML in comments being rejected. Generated code in the past in the Linux kernel have include XML in comments. As long as the comments don't start with .

Something like __opaque would be like a sparse with the Linux kernel uses. Lot of those attribute(()) with macros over with the Linux kernel are being picked up by sparse/smatch and used for Static code analysis.

Custom attribute support with callbacks would be definitely useful and is something we also wondered about in the past, e.g. perhaps we would do our own __safe ([[safe]]), or perhaps annotate certain enums with specific details etc. We would need to move away from the CLI, though, but we may want to do that eventually anyway.

https://sparse.docs.kernel.org/en/latest/annotations.html
There is already a __safe with sparse. I guess this is not the same safe.

https://github.com/torvalds/linux/blob/master/include/linux/compiler_types.h
Notice the #ifdef CHECKER in this file at line 28. Yes CHECKER is sparse c macro identifier. then the else at 51 to make all the sparse __safe and so on nothing operations.

Yes complier_types.h in the Linux kernel would have to grow a bindgen section to add the __opaque so that its only in the code as attribute((...)) when bindgen is operating.

One of the first things required is choosing bindgen c macro identifier before you can add anything..

Yes a complier could have its own attribute((safe)) that means something completely different.

Another consideration with custom attribute is this something sparse could process on C to locate defects in the Linux kernel C. If so Linux kernel maintainer will have very limited grounds to not to allow the addition to the source.

Ok looking at that complier_types.h I see bindgen does have a c macro identifier "BINDGEN"

So it is possible todo.

#ifdef  __BINDGEN__ 
#define __opaque  __attribute__((opaque)) 
#define __rust_safe(x) __attribute__((rust_safe(x)))
#else
#define  __opaque
#define __rust_safe(x)
#end

Yes complier_types.h in the Linux kernel would be where you would be declaring these operations.

This is what I am talking about working out what should in the C so you don't have keep multi files synced with each other. Instead look at the C file there is the complete story.

Of course there is the possibility that linux kernel already has some things marked that would be useful to rust generation and bindgen is not seeing them because you don't have #ifdef BINDGEN turn them on and lack method to process the attribute information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rust-for-linux Issues relevant to the Rust for Linux project
Projects
None yet
Development

No branches or pull requests

3 participants