hi there, kubasienki. I'm AI developer and a fan of reinforcement learning projects too.
what part of the game are you thinking about?
the game is very diverse. there's classic schemes like Normal/Intermediate/Elite, they're purely skill-based. There's a few probabilistic ones with different ratio of rng elements.
T17 scheme depends on how good the weapons are in the crates. there's few succesful examples of AI in RNG games, like DeepStack beating people in Poker, but generally, RL seem to perform worse in RNG games. perhaps it makes sense to avoid such schemes, at least for the first attemps?
and the problem of schemes like normal is that they are too complicated. too many weapons are available to the agent. learning to use them all on a high level is definitely not possible. at least i don't think so. mastering a single weapon, like a rope, would require a very advanced agent by itself. I don't know your experience, perhaps you're training very big and complicated AIs and can pull that off, but my default assumption is that it'll make sense to avoid them for the first attemps
you mentioned aim assist, from what I can assume that perhaps you wanted to use AI on a scheme like BNG?
BnG is probably ideal scheme for training an agent, in terms of how simple it is. the only problem is that it would be kinda boring. the agent would just throw perfect grenades that land exactly on target, perhaps even plop opponents right away, if the agent is sophisticated enough. it's also a perfect scheme for training, because it's the only scheme, where in-game bots are actually decent. once an agent can beat few 5-UP level bots (like in 1vs3 worms situation) it'll be close to a superhuman level
then there's schemes that depend on one niche skill, like roping. agent that can rope would be amazing to watch. but it's hard to train. for example, in TTRR, there's no way of telling how close you are to the finish, no real way to give clues to the agent. the other schemes that require high level roping, like wxw, usually depend on rules, like touching the walls, or collecting crates before attacking.
perhaps the best scheme is something like hysteria, which is simple enough for training, yet it would be interesting to watch the agent, because there's also room for strategizing. like killing your own worms to get turn advantage, darksiding, perhaps AI would discover interesting strategies? 1 sec turn time would also help with training, and this is the scheme in which the agent would look the most spectacular (people can't do in 1 second what an agent can - like walk, use multiple utilities at once, flight, throw weapons, knock, and so on)
what are your thoughts? maybe you wanted an agent for the game missions and i'm throwing all this stuff at you xD
btw, there are quite a few tool-assist replays in the game, in different schemes. they are useful if you want see how absolutely optimal actions in the game look like and what is actually possible in the game. might be useful to know in case the agent gets stuck in a suboptimal state