Description
I have a code that is written in CUDA that I am attempting to compile for a MI300A AMD GPU. I noticed that one specific file was taking over an hour to compile in -O3. I isolated that file and did some profiling to see that it was spending this time in the SROA optimization pass. Specifically, it seems that the portion of SROA that it is particularly taking a lot of time in is due to debug records.
I've ran llvm-reduce using a timeout to reduce the IR while also ensuring it is still taking a significant amount of time. The file included is the reduced IR just before it got to the debug records pass (after this pass it seems to greatly reduce the time taken). I'm trying to continue the reduction while skipping this pass to see if I can reduce the IR further without reducing the runtime too greatly. Since I'm using a timeout this reduction process takes a significant amount of time, so I'll try to keep this issue up-to-date as I get a better test case.
Right now, the test case I've attached here is about 100,000 lines and takes about 15 minutes on my system to run the SROA pass using the command opt -p sroa -disable-output sroa_long.ll
I'm having trouble attaching a file of that size to this issue, so I have tar'd it for now.