Okay sorry for the slight necro on this but there were a few philosophical points that were brought up that took a while to respond to and I want to cap this off to wrap up 0.95.1a before I get too far into looking at 0.96a:
Look at the radiants tanked/DP between seconds 1017 and 1031.
Absolutely unbeatable (ignore tachyons, those would slightly skew the test).
Why is this relevant? As in, why would it matter to look at the number of Radiants tanked in a specific 14-second window around 17 minutes into combat? This seems entirely arbitrary. Not to mention, it doesn't address what I was saying, namely how does determining this figure address:
But if you don't buy looking at how fast your fleet can kill an enemy fleet as a gauge of effectiveness, how do you justify the worth of a Monitor without reference to this? You don't advance the battle simply by sitting there tanking incoming damage; you advance the battle by having it let you accomplish more than you could otherwise.
Where is failure %?
You can do that if you want, but I've found that it's not necessary. Generally speaking, the ships with the higher overall battle DPS relative to their DP have a lower failure rate. That makes sense, since overall battle DPS includes having to absorb enemy offensive power, and ships die when there's too much enemy offensive power to absorb. So in my testing, it was the ships that were relatively weak (such as the Eagle) that had issues with dying, not the ships that were strong (such as the Gryphon). A double Ordos fleet is also a big enough battle that weaknesses in defense becomes apparent.
And how do you propose to measure this in a manageable way? A number of fleets have a success rate of over 90%; you'd need a
huge number of trials to distinguish between a fleet with 92% success rate versus another with 96% success rate, for example.
Fingers crossed you understand the silliness there. Yes, single tests are skewed towards specific types of ships.
Entire point was there is no 1 good metric, and it's easy to come up with equally good(bad) ones.
You still have yet to come up with one.
Why is speedrunning a 830/4 radiant stack more important than killing a 1000+/8 with lots of tachyons/plasmas? A single Ordo often has 3 Radiants, 6 can already show up in in doubles.
I look at the typical Ordos fleet. The average Ordos fleet contains slightly less than 2 Radiants. You can certainly gear your fleet toward a more Radiant-heavy fleet if you want, but then your fleet won't fare as well against a more frigate/destroyer/cruiser-heavy fleet. So I look at the middle of the spectrum.
500% XP is easy with single player ship, no need for Ordos
No that's not the point. The point is that even if the player has a full fleet, i.e. 240 DP's worth of ships and 8 level 5 officers, they'd already be getting +500% XP bonus against triple Ordos, so there's no practical reason to look at quadruple or more Ordos. There's no benefit realistically to that, it's just more of the same. It just lengthens how long it takes to do each run.
Yes you can get +500% XP if you run a tiny fleet, but I'm looking at full-sized fleets generally.
(by the time you get to Ordo stacks XP geenrally doesn't matter anyway).
It's the complete opposite. XP is one of the main reasons to fight Ordos (cores being the other), because the XP is so good and you need XP to get SP.
If the amount of ships killed still matters (the 500*4 > 500*2 thing) then why stop at 2?
You are punching below max power anyway, why not go down to a single faction fleet/ordo?
I chose double Ordos because triple Ordos ended up giving pretty much the same results (with some minor modifications i.e. needing to take EMR/MS for enough missiles to last), just that each run took 50% longer. So I get to save 1/3 of the time spent testing this way.
Single Ordos is different than double Ordos, if you're running at 400 battle size. Single Ordos, assuming 2 Radiants, means that you get a Radiant at the beginning, and another one at the end. Whereas double Ordos, assuming 4 Radiants, means you have to deal with multiple Radiants at the same time at the end. So you get to see the fleet's performance against multiple simultaneous Radiants as well.
Having said that, sometimes I'll look at single Ordos if I'm looking at a Support Doctrine fleet, since it's so small XP-wise that it maxes out its XP bonus against a single Ordos, so a single Ordos is all the fleet needs to face. (Spoiler alert: playtesting what I think the new Manticore (LP) will be has been a lot of fun, and right now it's looking like a split Manticore (LP) / Brawler (LP) fleet is the fleet I'll be using for my first playthrough after the update, to gather statistics on Ordos fleets (which needs to be done one fleet at a time), since it's so good at churning through single Ordos fleets, and so easy to make by just going to LP planets and bases and buying their ships then killing their fleets.) But generally speaking for any fleet with a sizable number of officers, double Ordos is more representative.
Not doing single fleets skews results towards ships with durability, and is also unfair to Colossus MkII since hammer barrage has low ammo.
Sure, and it also skews against fleets made up of 120 Kites with Reapers. If the missile has such low ammo that it won't last through a significant fight, well, you'd want to know that, and the testing
should reveal that kind of information.
Point was even if the only metric is time, frigates will kill small fry much faster than a Conquest could so bringing them will always speed things up (and is not unique to fleets with Onslaught/Paragon/whatever).
Destroying enemies fast is generally good, yeah. Just do not get tunnel vision about the speed, that leads to posts ranking the DPS of xyphos dominators.
Not necessarily. Frigates may be faster at grabbing objectives for example, but then you also have to consider how much they contribute for the remainder of the fight relative to the same DP of other ships that could've been used.
This points to why it's important to test against an entire Ordos fleet, instead of just concentrating on Radiants, and using actual campaign enemy fleets instead of sim. It gives the fleet's overall performance across the different phases of battle, from capturing objectives, to dealing with frigates/destroyers, to dealing with mass Brilliants, up to dealing with Radiants at the end. The overall battle DPS is an average score across all these phases, in the proportion that the player is expected to see them (hence why I chose "average" fleets in size and composition), without overly emphasizing one or the other. Hence why it's useful as a metric of fleet or ship performance.
I think you are right that taken by itself, it can lead to goofy results.
By "goofy results", my point is that the solution that it gives (i.e. the build that it determines to be the best) may not be at all useful for fleets that the player may encounter in the campaign. Since the goal of testing is to help inform as to what may be useful in a campaign battle.
I assume (let me know if you had something different in mind) that the process of finding the best builds A and B, in looking at A versus B, is to take a stock build B, look for the best build A (i.e. the build of A that leads to the highest win ratio against B), then once that's done, take that build A, look for the best build B, and then repeat back and forth until both sides determine that they can't do any better, i.e. converge. (Note that "converge" does not mean that the side wins, but it means that both sides can't increase their respective win ratios any further and thus the process ends.) The problem with that process is maybe the best build of A is "all rocks", then when you go to build B, it ends up being "all paper", then when you go back to build A, it ends up being "all scissors", then you go back to build B, it ends up being "all rocks", and so forth. At each iteration, each side is always trying to optimize against a different target, so there's no guarantee that this process will ever come to a converged solution.
And even if it converges, it may be because it's some build that the other ship has no good counter against as a platform, but which the typical campaign fleet (which is filled with a spectrum of different ship types) can easily counter. And hence it may end up being goofy, i.e. lead to some resulting build that does not perform well in actual campaign battles. For example, if the converged builds ended up being that an Onslaught beats a Conquest, the Onslaught build probably focused on having a lot of frontal firepower. But in an actual campaign battle, the enemy Conquests would have frigates and destroyers around (not to mention, the battle starts off with smaller ships with possibly 1 capital ship with them), and the Onslaught is not necessarily able to handle many small ships since the build would be specialized against one big ship. So the testing could result in a build that isn't actually that useful in fleet-on-fleet combat.
You also want to consider the manpower involved in doing this. Thus far I've released test results for 9 ships (10 if you consider the Conquest), which took several months. Each of those was basically trying out a bunch of different builds until I was reasonably confident that I've found the best one for that ship, including playing through the double Ordos battle several times with each of the most promising builds, and analyzing the results afterward. My spreadsheet for the results contain over 100 battles. So of all that data, I've essentially converged on a solution a total of 9 (or 10) times thus far. If you're testing A vs B with both sides able to change their build, each time it switches between optimizing A or optimizing B would be one of those steps. That's a herculean effort right there just to have them converge once, not to mention all the possible A vs B matchups that you can have.
Realistically I'm only able to play off-and-on depending on my work, and there are sometimes periods of weeks or even over a month when I don't play at all (nor post much on the forums), depending on how busy I get with work. So it's possible that someone with more free time to play the game can find results more quickly. But I'm not aware of any other larger-scale effort to quantitatively assess the effectiveness of different ships and compare their performance to each other. (As an aside, as Thaago mentioned elsewhere, WeiTuLo's analysis of fighters
here is essentially the same approach as what I'm doing, but for fighters. WeiTuLo looked at minimizing time to kill, while I look at maximizing damage per second, effectively its reciprocal, because of how mature the
Detailed Combat Results mod has become.) Thus a lot of the forum discussion just comes down to "I like this more" or "I think this is better" instead of giving something concrete for other forum posters to consider.
Testing stock A vs stock B does tell you how stock A will do against stock B, which is a subset of the parameter space of how well A does against B.
Yes, but there's no way to know how good the stock builds are relative to the possible builds of A and B (at least until you explore the space of possible builds). So you can't draw any inferences on how well ship A can perform against ship B based on that.
Alex has to take those kinds of fights into account at least somewhat since new players exist and you don't want the learning curve to be too steep for them. Taking into account only end game maximally optimized performance has the potential to skew early game intro and mid-game play.
I think the balancing levers are different here. At the endgame, you can assume that the player has all ships available, all hullmods available, all weapons available (though I exclude Omega weapons in my testing right now), all officer skills available, etc. So the main constraint comes down to DP, and thus looking at a ship's effectiveness (which I measure via its overall battle DPS) relative to its DP is important; changing a ship's DP to match its effectiveness is relevant (along with adjusting weapons or its OP costs, etc.). So for those, I think they *should* be balanced around the endgame.
At the early to mid game, you have to assume that the player is running around with whatever ragtag ships, hullmods, etc., he's been able to scrounge up to that point in the campaign. So ship DP, weapon OP, etc. are not as useful of a balancing mechanism, since he's likely not filling out the whole DP limit, and he's using whatever weapons he's been able to get his hands on, etc. Instead, it's based more on things like ship/weapon availability, size and difficulty of enemy fleets, etc. The game provides a spectrum of different difficulty fleets starting with small d-modded pirate and pather fleets, to get the player comfortable with them, and let the player gradually advance to trying out harder fleets with better rewards as he progresses. Plus the player can learn to run away from fights he can't handle yet. So that ends up being more of the balancing mechanism there.
Also, since the player essentially gets a different set of available ships, available weapons, available hullmods, etc., for the early to mid game in each playthrough, it's hard to come up with a common criteria for forum discussion that'll be applicable to all of them. I could say "well this ship with these weapons and these hullmods do really well in the mid-game" but that's not going to work for someone who missing that ship or that weapon or that hullmod, etc. So basically, the testing would only apply to the subset of playthroughs that has access to everything that the tests assume. Whereas, testing for the endgame, you can assume that the player has everything available. Then the testing results give a concrete goal for the player to move toward as they work on improving their fleet.
A set of tests we are not doing, but probably could is measuring how difficult NPC fleet compositions are as player opposition. Ideally, bounties that are at the same tier and reward should be roughly balanced against each other as well, so that a player knows what to expect going in after some experience.
Sure, although I think that comes down more to fleet generation than ship DP or weapon OP etc. In other words, that a 200k bounty is roughly equally difficult regardless of if it's a Hegemony fleet or Diktat fleet, etc. I *think* fleet generation is based on FP rather than DP though. I think it'd be better to continue using time to kill and/or overall battle DPS as a metric, because there's too many ways the player has to survive against the AI that would just make the testing drawn out. The player can just keep running away from the main enemy fleet until CR runs out for example.
I really enjoy your guys posts. What do you mean by (Nash) equilibrium in a single player game though?
This results from trying to find the best build for ship A that can beat ship B (i.e. find the highest win ratio against ship B), while simultaneously looking for the best build for ship B that can be ship A. Ship A and Ship B basically becomes two competing and directly opposing agents, each with decision-making ability (in looking for their respective best builds). As each side searches for their respective best build, at some point they reach a point where they can no longer improve any further. If both sides reach this point, then they've basically achieved equilibrium, or what I'm calling converging.
Let's say that we have a list of all possible enemy fleets...
In principle, yes. In practice however the space of possible player fleets is too large to easily explore, even if it's technically finite. The fleets' performance is also affected by the player's commands throughout the battle, as well as the player's own actions if he's piloting a flagship, and both of these again have a large variance which makes it difficult to easily explore.
Now I have read that it is supposedly good design that these choices would not be singular. I do not understand that perspective. If the solutions are not singular that is equivalent to the player having meaningless choices of strategy. Why would you want that?...
Now if you agree with the above then some consequences and observations...
There's a difference between "there are many good choices" and "all choices are automatically good". Good design means that the player has a lot of different possible strategies, and the optimum setup within each strategy would be about as good as the optimum setup within other strategies. In terms of Starsector, this would mean that the best missile-spamming fleet should perform about as well as the best high-tech fleet, which should perform about as well as the best carrier fleet, which should perform about as well as the best phase fleet, which should perform about as well as the best Safety Overrides fleet, etc. But it does not mean that all fleets should perform equally well, because there are going to be many fleet setups that are simply bad.
In terms of ships, since DP is a balancing metric for ships in Starsector (if a ship is more powerful, then making it higher DP, thus reducing how many of that ship the player is able to put on the battlefield at once relative to other ships, is a balancing mechanism to counter its power), then it makes sense to look at how well ships perform relative to their DP. In other words, good design would mean that 200 DP's worth of one ship, optimally built, should perform about as well as 200 DP's worth of another ship, also optimally built.
However, there's a deeper issue tucked in there: in Starsector, you can use a combination of different ships, and these ships may synergistically interact to create an even stronger fleet than spamming one type of ship or the other to make up your fleet. Or maybe they interact badly and make the fleet worse. And that's part of the fun, trying to figure out what combinations of ships work well.
But in terms of determining their worth, it becomes more awkward; you're no longer looking at how each ship performs by themselves, but
given that there is a ship B nearby. Also, the analysis of each ship's performance is more complex, since it's based on the capability of other ships as well. For example, if you have a large capital ships surrounded by smaller escort ships, then what you really get is the performance of that capital ship
assuming that it has smaller escort ships killing off stray enemy frigates so it can concentrate its firepower on the bulk of the enemy fleet and so forth. So then you have to evaluate them together as a group, and compare that with other ships which total the same DP.
Yes this is a really difficult problem to try to address (to "solve" in some context). So for me, I'm starting off with looking at each ship's performance by themselves, i.e. spamming the same type of ship, and looking at their performance. Doing this with ships that are relatively spammable (meaning: individually self-sufficient and do not need other types of ships to help support it) gives a baseline of performance which I can then use when looking at other ships and/or looking at ship combinations.
The class of ships that made the most sense to do this are the cruisers, and hence that's what I started with. Frigates and destroyers are too fragile, and capital ships are too vulnerable to getting swarmed by small ships. So the cruisers give a starting point as a basis for evaluating the relative power of different ships, and (by extension) their DP.
A fleet built against Ordos may not necessarily always be good against any other fleet, but it's a sufficiently high bar that it'll likely do well. Obviously Ordos requires a lot of anti-shield whereas you'd want to load up on anti-armor/hull against Derelicts instead, for example. So in theory, there should actually be a suite of different opponents, and different types of fleets would do well against different types of opponents. (And once there are a larger variety of endgame opponents to fight, that will likely be the approach that's needed.) In practice however trying to determine a good/optimal fleet against one opponent is already hard enough that I just went with the one that's the most relevant, the "biggest bang for your buck" with the effort put in.
About Conquests I think I've said my piece. Things seem bad for the statistical project, I suppose it was fun while it lasted. Conquest is excellent for farming Ordos though.
I think the main issue is that it got bogged down by stuff that, although they would give somewhat more accurate results in some sense, ultimately wouldn't really change the outcome much. For example, the focus on weapon arc, or the precise shape of the target when modeling where the weapon shot would hit. For me, it was good enough to take the double Ordos test fleet, note that the sum of the base armor values was 28k, and the average total armor damage that my fleets did (across multiple fleets) was 143k, and conclude that the total armor damage was around 5 times the sum of the base armor values, so it averages out to a hittable area of around 12 armor cells wide, and then move on from there.
It turns out that the discussion did produce probably the best Ordos-farming fleet I was able to find though: player-controlled flagship Onslaught, 3 Conquests (dual Squalls/Mjolnirs/HVDs/Harpoons, Graviton Beam, 4 Tac Lasers), and 2 Gryphons (Squall/Harpoons/Breaches/HVD). The beams are to maximize damage output, since there was OP left over; they're very OP- and flux-inefficient though but was more of a "might as well" option (adding around 8% more DPS), and would be the first to go if OP is needed. This fleet is able to kill double Ordos pretty much as fast as 8 Gryphons, so each Conquest really was worth about as much as 2 Gryphons (and Gryphons themselves are already very overpowered). The difference is that 3 Conquests use half the number of officers as 6 Gryphons, so you get a bigger XP bonus. An officered (level 5) Conquest costs 40 + 26.25 = 66.25 DP for XP bonus purposes, while 2 officered (level 5) Gryphons cost 40 + 52.5 = 92.5 DP for XP bonus purposes, or 40% more. So you basically get 40% more XP by using an officered Conquest instead of 2 officered Gryphons when they're fighting the bulk of the fleet; this fleet overall gets around 20% more XP than a fleet of flagship Onslaught and 8 Gryphons. It's small enough to average around +450% XP bonus against double Ordos, so it doesn't need triple Ordos to max out its XP bonus.
The reason why you wouldn't use 4 Conquests instead is that you're then sending the Conquests to the flanks at the beginning, which is needed to prevent your fleet from being surrounded but costs a lot of DP and reduces your firepower against the bulk of the enemy fleet. So it's better to send cheaper ships instead, i.e. Gryphons, to the flanks. Hence the Gryphons take care of the flanks while the Conquests take care of the main enemy fleet, and then the Gryphons join up as the whole fleet moves toward the top to the enemy spawn point. It's possible that I could use cheaper ships to cover the flanks, but few ships can put out as much damage as the Gryphon once the enemy fleet has balled up.
The best time that I got with this fleet against my double Ordos test fleet was 247 seconds, compared with 246 seconds for 8 Gryphons, and 248 seconds for 10 Gryphons, basically all virtually identical. It might seem counterintuitive that more Gryphons didn't actually help, but that's because 1) I start with 200 DP and then deploy the last 2 once I capture the objectives, meaning they won't be able to contribute as much in the first place since they come in later and 2) they run into overcrowding issues at the end when they're all gathered at the enemy spawn point. For example, if a frontline Gryphon takes too much damage, when it's too crowded it can't back off because there's another Gryphon blocking its path, so it ends up overloading and then not doing any more damage for a while. So fewer ships actually meant that they're able to space themselves out better.
Obviously the meta will change in 0.96a, but this was the best one I found in 0.95.1a. The Ordos fleets in 0.96a do seem a lot more dynamic (i.e. Novae and Brilliants rush forward unexpectedly, so the battle lines aren't as stable and predictable) so this setup may need some modification, but chances are it'll work too. (And since 0.96a is now out, these test results are too late for Alex to swing the nerfhammer in this direction...I hope...)
P.S. I just realized I forgot to address the main topic of this thread, energy Onslaught. It turns out that the main issue when loading up on burst energy weapons is that the Onslaught (at least how I play it) is a very much in-your-face ship, running into the thick of combat. So the problem is that in actual use, there's not enough downtime for the energy weapons to recharge -- they're limited by their sustained fire rate due to ammo regen. So testing Light Needler vs Minipulser side by side, the Light Needler ended up doing more damage pretty much every time.
However, in 0.96a, with Expanded Magazines increasing the ammo regen rate when s-modded, this significantly changes that balance. So 0.96a may be when energy Onslaught or energy Retribution or energy whatever comes to the fore as a perfectly viable playstyle; its time has come.