Hello, thank you for your invitation to check this project. Obviously that's a very interesting project regarding my own experience with CUDA, even the fact that you have started it with Python is perfect for my skills too.
First thing, my approach for projects like this is to get a general understanding of the algorithms involved (here you have already provided a very detailed description). Since my background is computer engineering and not in physics, I need to keep a computational approach, I wouldn't be able to tell much about the algorithms you use.
Then I would take the code apart (as much as possible) and analyze its current performance and bottlenecks. I had a quick look at your [login to view URL] code, I can recognize various parts alread with vectorized approach and would be fine for CUDA GPU offloading. It confuses me a little bit that parts of the code are currently unused, are they related to the modification you mentioned?
Your RTX A6000 GPU is certainly a beast, have you done the math regarding the memory requirements of 50,000 electrons and 100 past positions. In CUDA, the total size would depend on the size float numbers we would use.
Having said that, we could initiate a discussion about that if you're interested. Currently CUDA projects here act as side projects for me, most of my time elsewhere in a long term project. However I take these projects very seriously since I am really committed to being active in this programming field.
Regards,
Thanassis