Hierarchical data-driven approach to fitting numerical relativity data for nonprecessing binary black holes with an application to final spin and radiated energy

Numerical relativity is an essential tool in studying the coalescence of binary black holes (BBHs). It is still computationally prohibitive to cover the BBH parameter space exhaustively, making phenomenological fitting formulas for BBH waveforms and final-state properties important for practical applications. We describe a general hierarchical bottom-up fitting methodology to design and calibrate fits to numerical relativity simulations for the three-dimensional parameter space of quasicircular nonprecessing merging BBHs, spanned by mass ratio and by the individual spin components orthogonal to the orbital plane. Particular attention is paid to incorporating the extreme-mass-ratio limit and to the subdominant unequal-spin effects. As an illustration of the method, we provide two applications, to the final spin and final mass (or equivalently: radiated energy) of the remnant black hole. Fitting to 427 numerical relativity simulations, we obtain results broadly consistent with previously published fits, but improving in overall accuracy and particularly in the approach to extremal limits and for unequal-spin configurations. We also discuss the importance of data quality studies when combining simulations from diverse sources, how detailed error budgets will be necessary for further improvements of these already highly accurate fits, and how this first detailed study of unequal-spin effects helps in choosing the most informative parameters for future numerical relativity runs.


I. INTRODUCTION
According to general relativity, compact binary black holes (BBHs) coalesce through the emission of gravitational waves (GWs), as already observed by LIGO in at least two cases [1][2][3][4]. The merger remnant is a single Kerr BH [5] characterized only by its final spin and mass. An essential and robust tool for predicting BBH evolution, since the 2005 breakthrough [6][7][8], is numerical relativity (NR). Due to the large computational cost of each NR simulation, it is still computationally prohibitive to cover the BBH parameter space exhaustively. Hence it is natural to develop simple, yet accurate model fits to the existing set of NR simulations. One then has to study their quality of interpolation between NR simulations, and of extrapolation to regions of parameter space beyond the calibration range. For example, the phenomenological inspiralmerger-ringdown waveform models of [9][10][11][12][13], though used very successfully as one of two waveform families for LIGO O1 data analysis [1][2][3][4], still include only a limited set of physical effects and were calibrated to small sets of NR simulations with relatively simple fitting methods.
In this paper we develop a more general, hierarchical fitting approach for BBH properties and waveforms. The fit ansatz functions are developed through studying hierarchical structures present in the NR data set itself, and their complexity tailored to the actual predictive power of the data set by the use of information criteria.
We first apply this method to the spin and mass of the remnant black hole, which determine the frequencies of the quasinormal-mode ringdown [14][15][16][17] to a Kerr black hole. The ringdown is a crucial part in modeling full inspiralmerger-ringdown waveforms [9][10][11][12][13][18][19][20]; and from param-eter estimation with the full waveforms, the final mass and spin can be estimated with accuracy similar to other BBH parameters [2,4]. Future observations of strong GW signals will allow to test general relativity through consistency tests between inspiral, merger and ringdown [21], significantly improving upon [22].
For this paper, we concentrate on nonprecessing quasicircular BBHs, where the black hole spins are parallel or antiparallel to the total orbital angular momentum of the binary. These configurations are fully described in a threedimensional parameter space: given the masses m 1,2 and physical spins S 1,2 , we use the two component spins χ 1 = S 1 /m 2 1 and χ 2 = S 2 /m 2 2 and the mass ratio, given either as q = m 1 /m 2 with the convention m 1 > m 2 , or as the symmetric mass ratio η = (m 1 m 2 )/(m 1 + m 2 ) 2 = q/(1 + q) 2 . The total mass is only a scaling factor, and here we work in units of m 1 + m 2 = 1.
As suggested by the post-Newtonian (PN) results [36][37][38][39] for radiated energy and angular momentum, and confirmed by many previous studies of NR-calibrated models [9,18,40,41], the two dominant parameter dependencies of both final spin and final mass are on the mass ratio and on some appropriately chosen effective spin parameter of the binary, whereas the effects of any difference between the two individual spins are much smaller. However, only by accurately modeling these small unequal-spin effects can the full two-spin information be extracted from GW observations, disentangling any true physical degeneracies from systematic effects due to limitations of the waveform models.
A similar, but more complex, situation is encountered in the calibration of phenomenological models to precessing binaries, where so far a simple waveform model based on a sin-arXiv:1611.00332v2 [gr-qc] 27 Mar 2017 non-spinning data equal-mass equal-spin data 1D fits in η 1D fits in S 2D ansatz η, S equal-spin data extreme-mass-ratio limit constrained 2D ansatz η, S 3D fit η, S, χ 1 − χ 2 remaining data combine best 1D ansätze unequal-spin corrections FIG. 1. Flowchart of the hierarchical step-by-step construction leading to a three-dimensional ansatz and fit for the quantity of interest over the (η, χ 1 , χ 2 ) ≡ η, S , ∆χ space.
gle effective precession parameter [11][12][13] has successfully been employed to analyze the first gravitational wave detections [2,4]. (A complementary precessing analysis [42] based on the model of [20,43] has also been published recently. ) The current work, while mainly a first step to develop models for the full three-dimensional parameter space of nonprecessing waveforms, can also be considered as a toy model for the numerical calibration of subdominant effects in generic binaries, such as precession and higher modes.
For the current three-dimensional aligned-spin parameter space, in addition to the hierarchy of mass ratio, effective spin and spin difference, the sampling by available NR simulations still displays significant bias toward simple subsets: namely the one-dimensional subspaces of nonspinning cases (η dependence only) and equal-mass, equal-spin cases (η = 0.25, q = 1, χ 1 = χ 2 , thus effective-spin dependence only) are covered particularly well by existing NR catalogs, while few simulations exist for unequal spins, high spins, and/or very unequal masses. Independently, the extreme-mass-ratio limit (η → 0, q → ∞, arbitrary spins) is known analytically [44].
Here we exploit and investigate this structure by parametrizing spin effects in terms of an effective spin and a spin-difference parameter. As the effective spin we choose which has already been found to work well for final-state quantities in [9]. We discuss other possible choices for the effective spin, for which our method also works robustly, in Appendix C. For spin difference, we use simply ∆χ = χ 1 − χ 2 , which makes no assumptions on how spin-difference effects depend on mass ratio. We develop our hierarchical approach, with the aim to ensure an accurate modeling of the subdominant spin-difference effects, along the lines illustrated as a flowchart in Fig. 1: First we consider the one-dimensional subspaces of nonspinning and of equal-mass-equal-spin black holes. We then combine and generalize these subspace fits, adding additional degrees of freedom to cover the entire two-dimensional space of equal-spin black holes, but constrain the generalized ansatz with information from the extreme-mass-ratio limit. In a third step, we investigate the leading subdominant terms, which are The NR data set used in this paper, over mass ratio q = m 1 /m 2 and the two dimensionless spin components χ 1 , χ 2 , with color indicating the source catalog. Bright green points are cases removed from the analysis for data-quality reasons, as discussed in Appendix A. linear in the difference between spins, and also identify additional nonlinear spin-difference terms. We finally produce a three-dimensional fit to the complete data set with the hierarchically constructed ansatz. This way, we can construct a full ansatz with a relatively low number of free fit coefficients and avoid overfitting of spurious effects only due to small sample sizes, while still capturing the essential physical effects that are known from the well-constrained regions.
At each step, we evaluate the performance of different fit choices by several quantitative measures: by the overall residuals, by the Akaike and Bayesian information criteria (AICc, BIC, [45,46]), and by how well determined the individual fit coefficients are. The information criteria are model selection tools to choose between fits with comparable goodness of fit but different degrees of complexity, i.e. they penalize high numbers of free coefficients. See Appendix B for details on these statistical methods.
Previous published fits for final spin and/or mass include [9,27,[47][48][49][50][51][52][53][54][55], and we will compare our results to the most recent results in the literature, including both fits across the full three-dimensional nonprecessing parameter space, and the effective-spin-based fits from [9], which were used in the aligned-spin IMRPhenomD waveform model [9,10] and (with in-plane-spin corrections) in the precessing IMRPhe-nomPv2 model [11][12][13]. These will be referred to as the "Phe-nomD fits" in the following. The plan of the paper is as follows: We first describe our data sets, built from several NR catalogs, in Sec. II, together with the available extreme-massratio limit information. In Sec. III we develop the general fitting recipe for the example of the final spin. We then apply our method also for final mass -or equivalently, radiated energy -in Sec. IV, which illustrates some of the specific choices and adaptations required to apply the general method to each quantity. We summarize our method and results in Sec. V, and give additional details about NR data, fit construction, and fit uncertainty estimates in Appendices A-D.
Our fits for final spin and radiated energy are also provided in Mathematica and python formats as supplementary material [56], and implemented under the label "UIB2016" in the nrutils.py package of LALInference [57,58]. This final version of the paper uses a larger NR calibration set than the initial arXiv submission (1611.00332v1), with fit results fully consistent but better constrained. FIG. 3. Input data plotted against symmetric mass ratio η and effective spin S . Data consist of the combined set of NR simulations (colored points) and the analytically known [44] extreme-mass-ratio behavior (black line). Left panel: final spin; right panel: radiated energy rescaled by η. Both χ f and E rad /η follow a smooth surface in this space, and the well-constrained 1D subspaces together already give a good indication of its curvature.

A. Numerical Relativity data sets
We combine four data sets of aligned-spin numerical relativity BBH simulations from independent codes and sources: the public catalogs of SXS [59,60], RIT [52,61] and GaTech [62,63]; as well as a set of our own simulations with the BAM code [9,64,65], including 27 new cases for which initial configurations and results are listed in Table XIV in Appendix A. After removing 16 cases from the combined data set due to data quality considerations as discussed in Appendix A, we have 161 cases from the SXS catalog, 107 from RIT, 114 from GaTech and 45 from BAM; for a total of 427 cases. The sampling of our three-dimensional parameter space by the four data sets is shown in Fig. 2. The initial arXiv version of this paper (1611.00332v1) used a smaller data set of 256 NR cases, with the increase coming from an update of the public SXS catalog and new RIT results from [61].
To obtain a qualitative understanding of the hierarchical structure in the two-dimensional parameter space of mass ratio and effective spin, in Fig. 3 we show the NR data set over the (η, S ) plane together with the analytical extrememass-ratio results, discussed below in Sec. II B. For both final spin and radiated energy, we find a reasonably smooth surface spanned by the NR data points. These plots already suggests that -together with the known extreme-mass-ratio results to compensate the sparsity of NR simulations at increasingly unequal masses -good one-dimensional fits in the two best-sampled one-dimensional subsets (equal-massequal-spin and nonspinning BHs) will significantly constrain any two-dimensional fits.
Furthermore, as a first quantitative check that the assumption of a hierarchical structure in three-dimensional BBH parameter space holds, with unequal-spin effects subdominant to the dependence on η and S , we can study the residuals of this data set under the two-dimensional PhenomD fits [9]. For final spin, we find that 90% of relative errors are below 3%. (The only cases over 10% are those with absolute values close to zero, where relative error is not a good measure, and absolute errors (residuals) are limited to ∆χ f ≤ 0.025.) Still, this com- FIG. 4. Relative errors in final spin of the combined NR data set for this paper under the two-dimensional PhenomD fit [9]. parison suggests that unequal-spin effects make a large contribution to these small errors, as shown by four times smaller 90% quantiles when restricting to equal-spin cases only. See also Fig. 4 for histograms of these distributions. For radiated energy, 90% of relative errors are below 2%, with a reduction of that quantile by 1.4 for equal-spin cases only, indicating that spin-difference effects are even smaller for this quantity, which we will also see confirmed in our final results.
For details about extraction of final-state quantities, NR data quality and weight assignment, see Appendix A. As explained there, we do not have a full set of NR error estimates available, so we assign heuristic fit weights to each case based on the expected accuracy of the respective NR code in that particular parameter space region. For example, high-massratio cases are down-weighted more for puncture codes.
B. Extreme-mass-ratio limit The computational cost of numerical simulations of BH binaries in full general relativity diverges in the extreme-massratio limit η → 0. However, this limit is also equivalent to the much simpler case of a test particle orbiting a Kerr black hole. The energy and orbital angular momentum for that configuration have long been known analytically [44]: inserting the radius of the innermost stable circular orbit (ISCO) from Eq. (2.21) of [44] into Eqs. (2.12) and (2.13) of the same reference yields the test-particle energy (equivalent to the radiated energy) and orbital angular momentum at ISCO: Note that both E ISCO and L orb,ISCO depend linearly on η. In the test-particle limit, the small BH plunges after reaching the ISCO, and further mass loss scales with η 2 [66]. Similar to previous work [47,52,55,67], we will exploit this fact to compute the final spin and radiated energy to linear order in η from the analytical expressions, Eq. (2), holding at the ISCO. To linear order in η, we thus simply have E rad = E ISCO or M f = 1 − E ISCO for the final mass, and for the final spin χ f we obtain the implicit equation where the individual BH spins can be written in terms of our effective spin as Equation (4) can then be solved numerically for the final spin χ f as a function of η and of the effective spin S . Since this result holds to linear order in η, and assuming that the final spin and mass are regular functions of η, we have thus essentially computed the derivatives ∂E rad /∂η and ∂χ f /∂η at η = 0, in addition to the values at η = 0, which are E rad (0) = 0 and χ f (0) = S 1 /M 2 . Additionally, assuming that the final state is indeed a Kerr BH, its final spin has to satisfy χ f ≤ 1. One would also expect the final spin for maximal effective spin, S = 1, to decrease monotonically with increasing η. To construct an accurate fit in a neighborhood of S → 1 that satisfies these expectations -in particular the Kerr limit -we will constrain our ansatz with the analytically computed value of χ f = ∂χ f /∂η at (η = 0, S = 1). By perturbing Eq. (4) around {η → 0, χ f → 1} to linear order before taking the derivative in η at the same point, we find Several variations of this procedure have been used for previous final-spin fits, and differences are due to previous works neglecting the radiated energy in Eq. (4) [47,52], or not enforcing the derivative for satisfying the Kerr limit [55].

III. FINAL SPIN
We will now first develop the details of our hierarchical fitting procedure for the example of the final spin of BBH merger remnants, giving more detail here than we will do for the radiated energy in Sec. IV.

A. Choice of fit quantity
We first need to decide which quantity exactly we want to fit. It appears natural to fit a quantity related to the "final" orbital angular momentum L orb near merger, i.e. separating out the known initial spins S i . This is particularly useful in connection with the extreme-mass-ratio limit, since with Eq. (2)b, L orb is linear in η to leading order. We can use the relation from Eq. (4) between L orb and the dimensionless Kerr parameter χ f of the remnant BH, M 2 f χ f = L orb + S 1 + S 2 = L orb + S , also outside the extreme-mass-ratio limit. Here M f is the final mass of the remnant BH.
Instead of the actual angular momentum L orb , we take the liberty of fitting the quantity L orb = M 2 χ f − S , where (as throughout the paper) M is set to unity. This way, all fit results are easily converted to the final Kerr parameter χ f by adding the total initial spin S , and no correction for radiated energy has to be applied.

B. One-dimensional subspace fits
Motivated by the the unequal sampling of the parameter space by NR simulations, as visualized in Fig. 3, we start our hierarchical fit development with the simplest and bestsampled subspaces of the NR data set, constructing onedimensional fits L orb η, S = 0 and L orb η = 0.25, S over 92 nonspinning and 37 equal-mass-equal-spin cases. We do not restrict ourselves to polynomial fits, and also include ansätze in the form of rational functions. We have also found good fits for more general functions, but omit these here since we have not explored that option systematically.
All fits are performed with Mathematica's NonlinearModelFit function, but also partially crosschecked with the nls package of R. Since rational functions can have singularities, codes such as NonlinearModelFit may not converge to a valid solution without good starting values for the coefficients. We solve this problem by first performing a sufficiently high-order polynomial fit, from which we compute a Padé approximant at the desired order, and use the coefficients of this approximant as starting values for the rational-function fit. We denote rational functions with a numerator of polynomial order m and denominator of polynomial order k as an ansatz of order (m, k). Before fitting to the NR data, all ansätze are constrained by two facts: For nonspinning BBHs, both χ f and L orb have to vanish for η → 0, so that there can be no constant term in the ansatz. Furthermore, from the extreme-mass-ratio prediction Eq. (2), it also follows that the spin-independent coefficient linear in η is 2 √ 3 (see also [68]). We will include spin-dependent information linear in η in Sec. III C.
Thus, we obtain L orb η, S = 0 fits for a large set of polynomial and rational functions. Several of them produce competitive goodness of fit, as measured by the root-mean-squareerror (RMSE) or the full distribution of residuals. However, we do not want to overfit the data, which could induce spurious oscillations in the region of very unequal BH masses that is not covered by NR data. Hence, we rank the fits by information criteria penalizing superfluous free coefficients. Figure 5 shows the top-ranked fit in terms of Schwarz's Bayesian information criterion (BIC), which is a rational func-FIG. 5. Nonspinning NR data and one-dimensional L orb η, S = 0 fit as a function of mass ratio η. Top panel: best fit in terms of the Bayesian information criterion (BIC), a rational function R(3,1), see Eq. (7). Lower panel: residuals (∆L orb = data−fit) of this fit (points) and differences from the three next-best-ranking fits in terms of BIC (lines). See also Fig. 29  tion of order (3,1): The fit coefficients a i along with their uncertainties are given in Table I; all are well determined. The exact ranking of fits can depend on the choice of fit weights (see Appendix A) and on the ranking criterion, but we find that Eq. (7) is topranked by both BIC and AICc. While only ranked 6th by RMSE, none of the considered fits is better than Eq. (7) by more than 6% in that metric either. Additionally, under variations of the weighting scheme, this is robustly the fit among the top-ranked group -by all three criteria -with the lowest number of fitting coefficients, indicating it is a robust choice. For comparison, a simple third-order polynomial (two free coefficients) is disfavored clearly, by more than a factor of 8 in RMSE and an offset of +452 in BIC, and a fourth-order polynomial (three free coefficients, just as Eq. (7)) by almost a factor of 2 and by +332 respectively. The lower panel of Fig. 5 also compares the preferred fit both to the NR data and to the three next-best ranking fits by BIC. We find that the residuals are centered around zero with  no major trends, while the differences among high-ranked fits are much smaller than the scatter of residuals for the wellcovered high-η range, and that the "systematic uncertainty", as indicated by the difference of high-ranked fits, is still at the same level even in the extrapolatory low-η region. The BIC ranking for this example is also illustrated in Fig. 29 in Appendix B. Before the second 1D fit in the effective spin parameter S , we need to decide how we are later going to construct a 2D ansatz combining both 1D fits. We can either subtract or divide the NR data by the nonspinning fit, and find that subtraction exhibits a simpler functional form. Thus we decide to construct a 2D ansatz as the sum of nonspinning and spinning contributions. (We will choose a product ansatz for the radiated energy discussed in the next section.) We thus constrain the constant term of the 1D ansatz in S to reproduce the η = 0.25 nonspinning result, i.e. L orb (η = 0.25, S = 0) must be identical for both 1D fits.
The top-ranked L orb η = 0.25, S fits by BIC are shown in Fig. 6, and again we find a unique top-ranked fit by both AICc and BIC, a rational function of order (3,1): with four fit coefficients b i as listed in Table II. This ansatz is ranked 8th by RMSE, but with only 3% difference from the lowest RMSE, which is attained by a P(5) fit with one more coefficients, marginally disfavored by about +1.7 AICc and +2.6 in BIC. The best three-coefficient fit R(2,1) is significantly worse, with differences of over +40 in BIC/AICc and 40% in RMSE. Again, the distribution of residuals is well behaved, and differences between the four top-ranked fits by BIC are smaller than the scatter of residuals.

C. Two-dimensional fits
Next, we want to construct a two-dimensional fit covering the (η, S ) space, as it was illustrated in Fig. 3, by combining both the 1D subspace fits and the extreme-mass-ratio limit. As discussed above, we take the sum of Eq. (7) and the spindependent terms of Eq. (8). We introduce the necessary flexibility to describe 2D curvature and the extreme-mass-ratio limit by generalizing the S -dependent terms, inserting a polynomial of order J in η for each b i through the substitution The general 2D ansatz is thus Here we choose to expand to third order in η (J = 3), which is the lowest order leaving enough freedom to incorporate all available constraints from the 1D fits and the extreme-massratio limit, and, as evidenced by the residuals we find below, FIG. 8. Two-dimensional L orb η, S fit, visualized as L orb η, S /η. Application of the extreme-mass-ratio limit helps in avoiding extrapolation artifacts which would otherwise appear in low-η, high-| S | regions that are uncovered by NR simulations.
also high enough to adequately model this data set. Of the resulting 16 coefficients, the three f i0 in the numerator must vanish to preserve the L orb η = 0, S = 0 limit, while consistency with the equal-mass fit from Eq. (8) provides four constraints which we use to fix the f i3 terms: Four more coefficients are fixed by the extreme-mass-ratio information discussed in Sec. II B: we re-express Eq. (4) in terms of L orb /η and fit the discretized quantity where 2 √ 3 is the linear contribution from the nonspinning part (cf. Eq. (7)) and the χ f (η → 0, S ) values are obtained by solving Eq. (4) numerically for small η. Before fitting, we apply the derivative constraint from Eq. (6), which for the sum ansatz Eq. (10) implies a coefficient constraint We find this extra physical constraint to be essential in avoiding superextremal χ f results due to fitting artifacts. The extreme-mass-ratio limit fit coefficients are listed in Table III, and the improved agreement between analytical results and this new fit, as compared with the previous fit of [9], is illustrated in Fig. 7.
In summary, after constraining to the well-covered onedimensional NR data subsets and the analytically known extreme-mass-ratio limit, the 2D ansatz from Eq. (10) has reduced from 16 to 5 free coefficients. We fit it to the 60 remaining equal-spin NR cases that were not yet included in the 1D subsets. To remove possible singularities in the η, S plane for this rather general rational ansatz, we set the least-constrained denominator coefficient f 52 to zero. Thus we obtain a smooth four-coefficient fit as shown in Fig. 8. It has RMSE four times larger than the 1D η fit and twice as FIG. 9. Examples of spin-difference behavior at fixed mass ratios, for residuals ∆L orb after subtracting the two-dimensional L orb η, S fit, as defined in Eq. (14). Top row: q = 1; lower row: q = 4; left column: surfaces in S , ∆χ, ∆L orb space; right column: projections onto the ∆χ axis with linear and quadratic fits. At equal mass, the surface is parabolic, with the linear term (blue line) and mixture term (not shown) vanishing, but a clear quadratic dependence (orange line). At q = 4 and other intermediate mass ratios, the surface is very close to flat and the linear term dominates.
high as the S fit, but these residuals are still smaller than expected unequal-spin effects and rather noisily distributed without any clear parameter-dependent trends, thus indicating that the 2D fit sufficiently captures the dominant η, S dependence to form the basis of studying subdominant spin-difference effects. The least-constrained coefficient at this point has a relative error of about 25%, which is good enough to keep it in the ansatz for the next, 3D step where we will refit to a larger data set.

D. Unequal-spin contributions and 3D fit
Now the final step in the hierarchical procedure is to explore the subdominant effects of unequal spins, parametrized by the spin difference ∆χ = χ 1 − χ 2 . We first study the residuals of the 238 unequal-spin NR cases under the equal-spin 2D fit: We do this at fixed steps in mass ratio, having sufficient numbers of NR cases for this analysis at mass ratios q = {1, 1.33, 1.5, 1.75, 2, 3, 4, 5, 6, 7, 8}. This per-massratio analysis is only used to guide the construction of the full 3D ansatz and as a consistency check, while the final full 3D fit will consist of fitting the constrained 2D ansatz plus spindifference terms directly to the full data set.
At each mass ratio, we visually inspect the residuals, which span 2D surfaces in χ 1 , χ 2 , L orb or, equivalently, S , ∆χ, L orb space. As illustrated in Fig. 9, we find surfaces close to a plane, indicating a dominant linear dependence on ∆χ and possibly a mixture term S ∆χ. The exception is at equal masses, where quadratic curvature in the ∆χ dimension dominates. In this case, exchange of χ 1 and χ 2 yields an identical binary configuration, so that terms linear in ∆χ indeed have to vanish. We have also exploited this fact in the q = 1 analysis by adding mirror duplicates of each NR data point. Motivated by these empirical findings and symmetry argument, we introduce up to three spin-difference terms, The full 3D ansatz is then simply the sum of Eqs. (10) and (15): Adding higher orders in the effective spin or spin difference is not supported by visual inspection. At each mass ratio, we now perform four fits in ∆χ for the values of the A i : linear, linear+quadratic, linear+mixed, or the sum of all three terms. Examples are also shown in Fig. 9. We then collect the coefficients of each of these fits and use them as data A i (η) to be fitted as functions of mass ratio (see the 'per-mass-ratio data' in Fig. 10), using as weights the fit uncertainty from each mass ratio rescaled by the average data weight for that mass ratio. We also apply what we know about the extreme-mass-ratio and equal-mass limits: all three A i (η) FIG. 10. Spin-difference behavior of final-spin data after subtraction of the two-dimensional L orb η, S fit, showing the results of fits as in Fig. 9 at η steps corresponding to q = {1, 1.33, 1.5, 1.75, 2, 3, 4, 5, 6, 7, 8} and three estimates for the three ansatz functions A i (η) from Eqs. (15) and (19): (i) unequal-spin part of the final 3D fit from Eq. (16) ("direct 3D fit"), (ii) fit of the unequal-spin terms from Eq. (19) ("fit to residuals") to the residuals of the 2D fit from Eq. (10) over all mass ratios, (iii) fits of Eq. (19) to the per-mass-ratio results. Top-left panel: linear term A 1 only. The remaining panels are for the combined linear+quadratic+mixture fit, in clockwise order: linear term A 1 , quadratic term A 2 and mixture term A 3 . The A 1 results from the combined fit are very similar to those from the linear-only fit, demonstrating the robustness of extracting leading-order spin-difference effects. For the two lower panels, data points for low η are outside the displayed range, but the error bars are huge and hence this region does not contribute significantly to the weighted per-mass-ratio fits. In the direct 3D fit to the full data set, however, low-η information can be better incorporated, leading to the somewhat different shape of the mixture-term fit. See Sec. III E for more discussion of how well constrained these shapes actually are with the current data set. have to vanish in the limit η = 0, and the A 1 , A 3 linear in ∆χ have to vanish for η = 0.25. We thus choose ansätze of the form [38,39], and for the term quadratic in ∆χ. We find that the data can be well fit without any higher-order terms and by reducing some of the freedom of these three terms exploratory fits keeping all coefficients free give results close to integer numbers for the p i , q i = 1 and d 21 = 0. Hence we choose the three parsimonious ansätze The blue points and lines in Fig. 10 show these per-massratio results. The shape and numerical results of the dominant linear term A 1 are quite stable under adding one or two of the other terms. Fitting two terms, either linear+quadratic or lin-ear+mixture, yields quadratic/mixture effects of very similar magnitude, with the quadratic term following the same basic shape (an intermediate-mass-ratio bulge) as the other two. However, combining all three terms, the results match better with the expectations from symmetry detailed before, with the bulge shape limited to the linear and mixture terms while the quadratic term provides a correction mostly at similar masses.
Using again the q = 1, S = 0 and η → 0 constraints on the general ansatz from Eq. (16), we end up with a total of nine free coefficients in this final step. We now fit to 298 cases with arbitrary spins not yet used in the 1D fits, with results given in Table IV. Together with the coefficients from Tables  I-III, these fully determine the fit. To convert back from our fit quantity L orb to the actual dimensionless final spin χ f , just add the total initial spin S = m 2 1 χ 1 + m 2 2 χ 2 .
We find that the data set is sufficiently large and clean, and the equal-spin part modeled well enough from the 2D step, to confidently extract the linear spin-difference term and its ηdependence, which is stable when adding the other terms; and to find some evidence for the combined mixture and quadratic terms, whose shape however is not fully constrained yet.

E. Fit assessment
In Fig. 10 we also compare the spin-difference terms from this final "direct 3D" fit to those obtained from the per-massratio residuals analysis. The linear term is fully consistent, confirming that it is well determined by the data, while for the quadratic and mixture terms both approaches agree on the qualitative shape, but do not match as closely. Under the chosen ansätze, the 3D fit coefficients even for those terms are tightly determined (see Table IV). However, we have explicitly chosen the spin-difference terms in Eq. (19) to achieve this goal, while several other ansatz choices (changing the fixed exponents of the multiplicative η or 1 − 4η terms, or adding more terms with free coefficients in the η polynomials) can produce fits that are indistinguishable by summary statistics (AICc, BIC, RMSE). Still, most of these have some strongly degenerate and underconstrained coefficients, while the reported fit has the desirable property of sufficient complexity to be within the plateau region of summary statistics while not having any degenerate coefficients.
Yet, the shape of the functions A 2 (η) and A 3 (η) for the mixture and quadratic terms is not actually as closely constrained from the current data set as the coefficient uncertainties alone seem to imply, due to this ambiguity in ansatz selection. This becomes clear from the comparison of direct 3D fit and permass-ratio analysis in Fig. 10. The per-mass-ratio analysis also demonstrates that the data at mass ratios η < 0. 16 are not yet constraining enough to help characterize these terms. (The error bars are so large, and hence the weights so low, that they effectively do not contribute to the fit.) It also becomes clear that additional unequal-spin data at intermediate mass ratios would be very useful in constraining the A 2,3 (η) functions. Meanwhile, it is important to note again that the leading linear spin-difference term is already determined much more narrowly and robustly with the current data set.
We can further assess the success of the hierarchical 3D fitting procedure by comparing • a 2D fit (equal-spin physics only) to equal-spin NR cases only (same as in Fig. 8), • a 2D fit (equal-spin physics only) to all NR data, • and the 2D part of the full 3D fit.
As shown in Fig. 11, fitting the 2D equal-spin ansatz to the full data set induces strong curvature in the (η, S ) plane, which the full 3D fit is able to correct by the additional degrees of freedom in the spin-difference dimension. This is how it was possible to pull out the subdominant spin-difference effects FIG. 11. Green: Difference ∆L orb of a 2D fit (equal-spin physics only) to the full data set minus the 2D fit to equal-spin cases only, both including extreme-mass-ratio constraints. The strong curvature at intermediate mass ratios and nonzero spins is due to the equal-spin-physics-only fit trying to compensate for the addition of unequal-spin NR cases. Orange: Difference ∆L orb of the 2D part of the 3D fit to the full data set minus the 2D-only fit to equal-spin data. The bulk of the parameter space is no longer distorted, and only at high effective-spin magnitudes a small opposite effect to the η-dependent behavior of the spin-difference terms (cf. Fig. 10) can be seen. TABLE V. Summary statistics for the various steps of the hierarchical final-spin fit. Note that it is not meaningful to compare AICc and BIC between data subsets of different sizes. There is statistical preference for the 3D fit including all three linear+mixture+quadratic terms, although many different choices of the A i (η) ansatz functions yield similar results with just ± a few percent in RMSE and ± a few in AICc/BIC, so that the shape of the mixture and quadratic terms is not yet fully constrained.
with this enlarged data set. The same conclusion is supported by the comparison of summary statistics between the various steps and 2D/3D fit variants in Table V, showing that the RMSE only increases by 50% from the 2D equal-spin case to the full 3D fit using all data.
The distribution of fit residuals for the full data set, projected onto the (η, S ) plane, is shown in Fig. 12, and a comparison of fit residuals with other previously published fits, over the calibration data set of the current work, is shown as histograms in Fig. 13 and summarized in Table VI along with AICc and BIC metrics. The shape of the distributions is consistent, and for all fits the means are much smaller than the standard deviations, showing no evidence for any systematic bias. Our new fit improves significantly over the previous fit [9] used in the calibration of the IMRPhenomD waveform model [10], and also yields some improvement over recent fits from other groups [55,61], even when those ansätze are refit to our present NR data set.
Refitting our final hierarchically obtained ansatz directly to the full data set produces slightly better summary statis-   Tables I, II and IV, not counting those constrained from the extreme-mass-ratio limit. We also show results for refitting previous ansätze to the present NR data set, for a refit of our hierarchically obtained ansatz directly using the full data set, and for the same fitting procedure, but using uniform weights.
Comparison of this work with previously published fits [9,55,61] in the limit of extremal aligned spins, χ 1 = χ 2 = 1. The shaded region shows our fit's 90% confidence interval, which is narrow enough to indicate that discrepancies with the referenced fits are significant and due to the different ansatz constructions, especially in the extreme-mass-ratio limit (cf. Sec. II B), and not just a consequence of insufficient data.
tics, but also allows uncertainties from the less well-controlled unequal-spin set to influence the other parts of the fit, while the stepwise fit gives better control over the extreme-massratio behavior and better-determined coefficients for the wellconstrained subspaces.
As a further test of robustness, we have repeated the hierarchical fitting procedure with uniform weights instead of the weights used so far and discussed in Appendix A. This yields a fit consistent with our main result, though slightly less well constrained, but still improving over previous fits, thus demonstrating the robustness of the hierarchical fit construction under weighting choice.
We have also verified that our new fit does not violate the χ f ≤ 1 Kerr bound, particularly in the extreme-spin limit ( S = 1) and at low η, see Fig. 14.

F. Precessing binaries
While some existing final-spin fits [53][54][55] also include a calibration to precessing cases, it is also possible to use a simple "augmentation" procedure [49] (see also [67]) for alignedspin-only calibrations by adding the contribution of in-plane spins in quadrature to the aligned-spin fit result: This procedure is known to significantly improve accuracy and reduce bias for precessing binaries. For example, it has been applied to the aligned-spin PhenomD fit [9] for the precessing PhenomPv2 model [12,13], and to the RIT fit [52] in recent parameter estimation work of the LIGO-Virgo collaboration [4,42,69] (including spin evolution according to [41]). Applying Eq. (20) to our aligned-spin fit, we find a small overshooting of the |χ f | ≤ 1 Kerr bound for mass ratios q 24, when the spin magnitude of the heavier BH is very close to extremal, and for certain orientation angles θ i of the black holes' spins to the angular momentum. The worst cases give an excess in χ f of about 0.12% at q ∼ 60 and intermediate opening angles, comparable to the aligned-spin fit residuals. No overshooting occurs if only the linear-in-η term in the final spin is used. Such a small inaccuracy when extending the alignedspin fit to precessing cases is in principle not surprising, as this parameter-space region is not covered with NR simulations and hence the fit slope in this region is purely determined by extrapolation between the NR data and the extreme-massratio limit, which we have ensured to be smooth with a flat approach to χ f = 1 at (η = 0, χ 1 = 1) (see Sec. II B and Fig. 14). Very small inaccuracies in the intermediate-η extrapolation region can thus lead to a minimal Kerr violation when adding the in-plane spins according to Eq. (20). A clean solution to this issue would require more calibration NR simulations in the critical region and a study of precessing spin contributions in the extreme-mass-ratio limit.
However, as the overshooting is very small, we have investigated two easy ad hoc solutions: We could take the worst-case point and enforce our 3D fit to be at or below the corresponding L orb value for the aligned-spin projection χ 1 = cos(θ 1 ) by putting a constraint on one of the f i2 coefficients. This can remove the overshooting at the worst-case point and nearby, but not over the whole problematic region, as the fit still has enough freedom in other parameters. But the accuracy of the aligned-spin fit already suffers from this one extra constraint, and using constraints on more than one coefficient to pull down the augmented χ f over a wider parameter region is fully prohibitive because insufficient freedom will remain in the fit to properly calibrate to the actual NR data.
Hence, we opt for an even simpler solution, truncating the augmentation from Eq. (20) at unity: χ f = min χ aug f , 1.0 . This is justified as the overshooting is very small, on the order of the fit residuals, and limited to an extremal parameter-space region. The need for this ad hoc truncation will reduce or become obsolete when low-η-high-spin NR simulations and/or precessing extreme-mass-ratio information become available. A detailed comparison of fit accuracies over a representative set of precessing NR runs is left to future work.

IV. FINAL MASS AND RADIATED ENERGY
To fit the final mass of remnant BHs from BBH mergers, we use the same hierarchical approach as for the final spin, summarized in Fig. 1, with only minor modifications. The quantity we are going to fit here is the dimensionless radiated energy, E rad = M − M f = 1 − M f . For η = 0, it has to vanish even in spinning cases, while the analytical expectation for the leading order in η, as η → 0, is E rad (η)/η ∼ 1 − 2 √ 2 /3. We construct the two-dimensional E rad η, S ansatz as a product of the 1D ansätze, instead of a sum as in the final-spin case. In principle, the fitting procedure is robust enough to use either a sum or product ansatz for either final-state quantity. Actually, by carrying out the full procedure for E rad as described in the following subsections, but using a sum of the 1D contributions, we found a fit statistically at least competitive with that obtained from a product. However, in this case the sum ansatz tends to produce suspicious curvature in the S = 1, low-η region, which cannot be suppressed by the extrememass-ratio information. With additional NR data in this region, that problem might be alleviated, but with the current data set we find the product ansatz to be more robust and able to yield a final fit that is both accurate and well determined  over the calibration region and without obvious bad behavior in extrapolation.

A. One-dimensional fits
For the nonspinning 1D fit in symmetric mass ratio η, a simple fourth-order polynomial with three free coefficients, listed in Table VII, is marginally preferred by both AICc and BIC. More complicated rational functions are not able to yield any significant change in residuals (only up to 1% in RMSE), while the differences between Eq. (21) and the next-ranked fits are again much smaller than the remaining residuals, as shown in Fig. 15.
For the effective-spin dependence, again the value at η = 0.25, S = 0 is fixed from the η fit. A rational function of order (3,1) is top-ranked by AICc, BIC and RMSE and thus unambiguously selected as the preferred ansatz:  with four free coefficients listed in Table VIII, and wellbehaved residuals as seen in Fig. 16.

B. Two-dimensional fits
For the 2D ansatz, we combine the two 1D fits from Eqs. (21) and (22), expanding each S -dependent term with a polynomial in η, according to Eq. (9), and removing the η = 0.25, S = 0 value from the spin ansatz before multiplying with the η terms: Contrary to the sum ansatz for χ f in Eq. (10), we do not need to set the η-independent coefficients f i0 of the S terms to zero, as the E rad η, S = E rad (η, 0) (1 + . . . ) form of Eq. (23) already guarantees the correct η = 0 limit. Hence an expansion up to third order in η of each S term, as we chose for the χ f fit, would yield too many free coefficients, and instead we only expand up to second order. The four f i2 coefficients are again fixed by the equal-mass boundary conditions: FIG. 17. Radiated energy: extreme-mass-ratio comparison of analytical results, the previous PhenomD radiated-energy fit of [9], and this work. Note that, constrained to capture the steep rise at S → +1, the new fit actually deviates slightly more from the analytical result at low positive S . This could be avoided with a more complex 1D S ansatz, which is however disfavored by the current NR data set.  18. Two-dimensional E rad η, S fit, visualized as E rad η, S /η. Application of the extreme-mass-ratio limit helps in avoiding extrapolation artifacts which would otherwise appear at low-η, high-| S | regions that are uncovered by NR simulations.
Similar to the procedure for χ f , we can use the extrememass-ratio limit to fix the four coefficients f i0 of the linear-inη terms. Using the analytic result from Eq. (2)a, we force the fit to satisfy the equality and fit the corresponding leading-order η dependence of our FIG. 19. Examples of spin-difference behavior of the radiated energy at fixed mass ratios, for residuals ∆E rad after subtracting the twodimensional E rad η, S fit. Top row: q = 1 (mirror-duplicated data points shown in gray); lower row: q = 4; left column: surfaces in S , ∆χ, ∆E rad space; right column: projections unto the ∆χ axis with linear and quadratic fits. At equal mass, the linear term and mixture term vanish, but the expected quadratic dependence (parabolic surface) is less clearly pulled out from rather noisy residuals than for the final spin (cf. Fig. 9). At q = 4 and other intermediate mass ratios, the surface is not as close to flat as in the final-spin case, and the noisy data still shows some quadratic dependence.
2D ansatz to discretized values of this quantity. Again we fix one of the four free coefficients of E rad η → 0, S by a constraint fixing the value at S = 1, which is necessary to capture the very steep rise of Eq. (25) as S → +1: The agreement between discretized analytical result and fit is shown in Fig. 17, and fit coefficients are listed in Table IX. We thus have 12 − 4 − 4 = 4 free coefficients f i1 , of which f 21 turns out to be extremely poorly constrained, so that we set it to zero before refitting. Results of the 2D fit, calibrated to equal-spin simulations only, are shown in Fig. 18, which shows that the steep shape of the extreme-mass-ratio limit at high S is smoothly attained by the extrapolated fit. For the curvature at low η and extremal S = 1, where there is no NR data, there might be also a contribution from the small remaining fit issues in the extreme-mass-ratio limit (cf. Fig. 17). The residuals again have larger RMSE than the 1D fits in η and S , by factors of 6.5 and 1.8 respectively, but show no clear apparent trends, allowing us to use this 2D fit as the basis for an unequal-spin residuals study in the next step.

C. Unequal-spin contributions and 3D fit
The spin-difference dependence of unequal-spin residuals is less clear here than for the final spin: As seen in the examples of Fig. 19, the general trend is the same with a quadratic dependence on ∆χ at equal masses and more dominant linear effects as η decreases, but the distributions are generally noisier and the second-order terms (quadratic and mixture ∝ S ∆χ) cannot be as cleanly separated.
For both the per-mass-ratio-step analysis and the direct 3D fit, we use the same general functional forms for possible linear, quadratic and mixture terms as in Eqs. (15), (17) and (18). After fixing ill-constrained coefficients to integer values, these reduce to c Figure 20 shows that the linear term is again robustly determined and does not change shape much when adding the two additional terms, but already the per-mass-ratio and direct-3D fits for this term do not agree quite as closely as in the χ f fit. The quadratic term is more noisy, and for the mixture term the FIG. 20. Spin-difference behavior of radiated-energy data after subtraction of the two-dimensional E rad η, S fit, for the three ansatz functions A i (η) from Eq. (27), with the same mass-ratio steps and fits as in Fig. 10. Top-left panel: linear term A 1 only. The remaining panels are for the combined linear+quadratic+mixture fit, in clockwise order: linear term A 1 , quadratic term A 2 and mixture term A 3 . The A 1 results from the combined fit are very similar to those from the linear-only fit, demonstrating the robustness of extracting leading-order spin-difference effects. For the two lower panels, results are much more uncertain, and the error bars for low η go far outside the displayed range, so that this region does not contribute significantly to the weighted per-mass-ratio fits.
results are rather uncertain, with an apparent sign change in the effect over η, but the stepwise cross-checks at least agreeing on the overall shape. Still, we will see below that inclusion of both these effects is statistically justified.
The full 3D ansatz for E rad η, S , ∆χ is then built up as E rad η, S , ∆χ = E rad η, S + ∆E rad η, S , ∆χ , and this time has eight free coefficients (three from the 2D ansatz and five from the spin-difference terms). Results for the fit to 298 NR cases not previously used in the 1D fits are listed in Table X

D. Fit assessment
Tbl. XI gives a statistical summary of the various 1D, 2D and 3D fits for E rad as discussed in this section. We find that the full linear+quadratic+mixture fit to all data has better RMSE than the simpler versions, and actually on the same level as the 2D equal-spin fit to equal-spin data only. The additional coefficients are also justified by yielding the best AICc and BIC, though not as clearly as for the final-spin case in Table V. Thus, we choose this three-term ansatz, matching the final-spin choice. Also, from Table X we see that all coefficients of this ansatz are sufficiently well constrained, with a worst standard error of 21.6%, to be useful for applications, at least under the assumptions made for the ansatz terms in Eq. (27). Just as was the case for the final-spin fit, and even more so because of the noisier data, we caution that ambiguity over the exact shape of the spin-difference terms remains, because different choices of the exponents and expansion orders in Eq. (27) can yield statistically indistinguishable fits.
As an additional check on the overall functional behavior of the fit, in Fig. 21 we show again comparisons between the intermediate 2D fit calibrated to equal-spin data only, a 2D fit to all data (not working well, as expected) and the 2D part of the final 3D fit. The equal-spin 2D fit and 3D fit agree well, with the strong excess curvature of the all-data 2D fit in the high-S region reduced significantly thanks to the unequal-spin terms.  Table V. FIG. 21. Green: Difference ∆E rad of a 2D fit to the full radiated energy data set minus the 2D fit to equal-spin cases only, both including extreme-mass-ratio constraints. Orange: Difference ∆E rad of the 2D part of the 3D fit to the full data set minus the 2D-only fit to equal-spin data. The residuals of the new E rad fit are shown over the η, S parameter space in Fig. 22 and as histograms in Fig. 23, compared with several previously published fits [9,19,52]. Comparison statistics are also summarized in Table XII. Again we find significant improvement, by a factor of 5 in residual standard deviation over the simple two-coefficient fit of [19] and about 40% improvement over the previous PhenomD fit of [9] and the RIT fits of [52,61]. As the maximum possible emitted energy fraction -for equal-mass BBHs with extremal positive spins -the new fit predicts E rad ≈ 0.1142, consistent with the fit from [70] which was specifically calibrated to equal-mass-    Fig. 2. Also listed are a refit of the PhenomD [9] ansatz to the present NR data set, a refit of our hierarchically obtained ansatz directly to the full data set, and results with the same fitting procedure, but using uniform weights.
equal-spin cases and with [9,52], but 15% higher than [51], underlining the importance of high-order S terms at extremal spins.

V. CONCLUSIONS
We have developed a hierarchical step-by-step fitting method for results of numerical-relativity simulations of BBH mergers, assuming nonprecessing spins and negligible eccentricity, so that the parameter space is given by the mass ratio and two spin components: (η, χ 1 , χ 2 ). We have then applied the method to the spin χ f and mass M f of the remnant Kerr BH, the latter being equivalent to radiated energy E rad .
An appropriate fit is constructed in simple steps which reduce the dimensionality of the problem, with the ansatz choice at each step driven by inspecting the data. The full higherdimensional fit is then built in a bottom-up fashion, modeling each parameter's contribution in order of its importance, as illustrated in the flowchart of Fig. 1.
A key goal of our approach is to avoid overfitting. To achieve this, and to evaluate the quality of the fits, we use in-formation criteria (AICc and BIC) described in Appendix B. At least as important, however, is also the hierarchical datadriven nature of our procedure, e.g. when modeling the subdominant dependence on the difference between the spins. Through a reduction to one-dimensional problems, and inspecting the data at different mass ratios, an appropriate model with a small number of parameters is easily identified.
We compare our results with previous fits in the literature, also refitting these previous models to our data set, and we find a clear preference for the new fits in terms of residuals and information criteria.
Our emphasis on inspecting the underlying data set, and comparing its quality with the statistical analysis of fit errors, highlights that for further improvement of these numerical fits, it will be essential to understand the uncertainties and systematic effects in a data set, and to cleanly combine data from different codes, rather than simply increasing the number of calibration simulations without controlling the error budget.
In addition to the quality of the fit as applied to the available numerical relativity data, we investigate extrapolation properties beyond the calibration region, in particular for extreme spins, where we check that our fit does not overshoot the extreme Kerr limit, and avoids pathological oscillations. The extreme-mass-ratio limit has previously been incorporated into fits for final mass and spin in different ways, in particular for the final spin, where the influence of radiated energy is not always accounted for (e.g. in [47,52]). Doing so is however important to avoid oscillatory features due to an unphysical value of the derivative with respect to the symmetric mass ratio η at η = 0, and indeed we achieve a more robust behavior for close-to-extremal final spins in comparison to other recent fits [52,55] -see Fig. 14. In order to avoid numerical errors in this derivative, we use an analytical expansion around the (η = 0, S = 1) corner.
We have also verified that our fits for final mass and spin are consistent with the Hawking area theorem for black holes [71]: the area of the final horizon is larger than the sum of the individual horizons of the initial black holes. The difference in areas is much larger than the fit errors, thus the area theorem does not provide an additional constraint on the fits.
For both χ f and E rad , we can robustly identify the main unequal-spin contribution of the form f (η) (χ 1 − χ 2 ). The shape of the function f (η) is well determined for final spins, but has larger errors for the smaller unequal-spin contribution to radiated energy. We also discuss and find statistical evidence for two further unequal-spin terms, one quadratic in ∆χ, and a mixed S ∆χ term, which however are comparable in size to errors in the numerical data, so that their η-dependent shape cannot be tightly constrained with the current data set.
The same hierarchical fitting procedure can be easily applied to other quantities such as peak luminosity [72,73]. An important goal of our work is the accurate calibration of unequal-spin effects for full inspiral-merger-ringdown waveform models, in particular toward extending the PhenomD model [9,10]. This will allow investigating under which conditions gravitational-wave observations can reveal the individual spins of a coalescing binary. While no fundamental modifications are necessary to the method itself, a careful analysis of the data sets will be required to make appropriate choices in combining the 1D subspace fits, including extreme-mass-ratio information, and adding spin-difference terms. A more ambitious goal for the future is the extension of our hierarchical fit construction to the higher-dimensional problems of generic precessing BBHs. For NR calibration of BBH coalescence models, it is useful to combine data sets from different research groups and numerical codes, both to increase robustness against code inaccuracies and errors in the preparation of data products (such as incorrect metadata), and to benefit from the combined computational resources of different groups. Computational cost increases significantly with mass ratio and spin, thus the highmass-ratio and high-spin regions are still poorly sampled, and numerical errors are often higher (as we will see below). Very different spins on the two black holes will typically also increase computational cost due to the need to resolve different scales in the grids around the two black holes (higher spin leads to smaller black holes for typical coordinate gauges in numerical relativity), which is a challenge for accurately capturing small subdominant effects like the nonlinear spindifference effects we discuss in this work.
In this Appendix, we discuss our procedures to eliminate data points of poor quality, to assign fit weights, and to check consistency between assumed error bars and our fit results. We expect two main avenues to significantly improve over the fits we have presented in this paper: (a) providing more data points with high spins and unequal masses, in order to improve the accuracy of the fit near the boundaries of the fitting region and to reduce the need for extrapolation; and (b) determining more accurate and robust error bars for NR data, which would allow one to isolate small subdominant effects.
Final spin and final mass are usually computed as surface integrals over the apparent horizon using the isolated-horizon formalism [74] (see [75] for a summary of methods and references for different codes), from surface integrals over spheres FIG. 24. Differences between radiated energy computed from either horizon or waveform data, across the parameter space. The color scale quantifies the differences between the two computations. Differences are largest for high-mass-ratio and high-spin cases, where high NR accuracy is more demanding.
at large or infinite radius (as in [64]), or from the energy or angular momentum balance computed from initial and radiated quantities (see Eq. (A1) below). Final spin and final mass can also be obtained from fits to the ringdown [76].
Final mass and spin from the apparent horizon (AH) are generally expected to be more accurate than those based on the evaluation of asymptotic quantities (such as Bondi mass, angular momentum, or radiated energy and angular momentum) at finite radius, where errors may arise due to finite radius truncation, insufficient extrapolation to infinity or numerical inaccuracies in propagating the wave content to large distances at sufficient numerical resolution. On the other hand, AH quantities may suffer from inaccuracies in finding the apparent horizon and from gauge ambiguities.
For SXS, RIT and GaTech results, we take the values provided in the catalogs [52,59,60,62,63], which in general have been computed at the horizon. For a comparison of horizon vs radiated quantities for the RIT results, see Table V in [52], where error bars for the horizon quantities are significantly smaller. For the BAM code, for some large-mass-ratio cases the AH finder fails due to the unfortunate choice of a shift condition, which results in a coordinate growth in the horizon which is roughly linear in time during the evolution. After several orbits the horizon of the larger BH is then no longer contained within the fine grid of the mesh refinement, which may trigger a failure of the horizon finder code. Due to the high computational cost of the simulations, we have not rerun these cases with improved parameters for the apparent horizon finder code. But rather, we compute the final angular momentum from the angular momentum surface integral at large radius, and the energy from the radiated GWs. For processing GaTech waveforms we have also followed [77].
For all 414 cases where we have both the waveform and AH estimate available, we perform cross-checks between AH quantities and those obtained from integrating radiated energy and angular momentum. In order to compute the final mass from the radiation quantities, we integrate 25. Histograms of the differences between radiated energy computed from either horizon or waveform data (as in Fig. 24), compared with the residuals of the new radiated-energy fit (as in Fig. 23).
over time, starting after the initial burst of "junk radiation". We extrapolate the result from different extraction radii to infinity at linear order in inverse radius. To account for energy radiated at separations larger than the initial separation of the NR simulation, we compute the radiated energy at 3.5 PN order [78][79][80][81][82][83][84] from ω ∈ [0, ω 0 ], with ω 0 being the initial orbital frequency of the NR simulation (after junk radiation). Then we can consider the differences between radiated energy values from the horizon and from the integrated waveforms, shown in Fig. 24, as an estimate of NR errors. However, this will typically be a pessimistic estimate because horizon quantities are in general more reliable and thus big differences are typically caused by inaccuracies in the integrated emission. In Fig. 25 we show that the distribution of this pessimistic estimate is similar to, but much wider-tailed than, the residuals from our radiated-energy fit.
A more realistic measure of NR errors is the difference between results from different codes for equal initial parameters. With a strict tolerance requiring equal initial parameters to within numerical accuracy, we find 41 such duplicate configurations out of the total of 427 cases. 1 We evaluate differences between these equalparameter cases for final spin and radiated energy. Figure 26 shows that, with strict tolerance, these error estimates (standard deviations of 3.1 × 10 −4 for χ f and 1.6 × 10 −4 for E rad ) 2 are still on the same order but smaller than the respective fit residuals (RMSE of 5.2 × 10 −4 for χ f and 2.2 × 10 −4 for E rad ). However, the set of true duplicates is small and mostly concentrated in equal-spin-similar-mass regions of the parameter space (cf. Fig. 27), preventing us from naively extrapolating this error estimate to the full parameter space. Hence we consider it as a somewhat optimistic estimate of final-state NR errors. We therefore have a rough expectation for the range of possible NR errors bracketed by these pessimistic and optimistic estimates, but no detailed information for each case over the whole parameter space. Instead, we use simple heuristic fit weights. The overall scale of the NR error is not relevant for determining fit weights, so we only need to assign relative weights between the cases, emulating the usual quadratic scaling with data errors which can also be deduced from Fig. 24. For SXS data we down-weight cases with η < 0.1 by a factor of 2 2 ; while for the puncture codes (BAM, GaTech, RIT) we expect larger inaccuracies especially at low η, and so we FIG. 28. Unequal-spin effects for final spin χ f at q = 5, shown as residuals against the 2D equal-spin fit (cf. Fig. 9). The three points highlighted in red are similar configurations from the GaTech catalog, for which it has since been confirmed [85] that the sign of χ f should be negative instead, making L orb fit with the trend -corrected values are shown in green.
down-weight by a factor of 2 2 above η = 0.223 and 3 2 below that mass ratio, and a factor of 5 2 below η = 0.05 (including the computationally challenging q = 18 cases). As mentioned before, a more detailed NR error study, leading to better-determined weights, would be a clear avenue to further improve fit results.
From the original set of NR simulations we have removed 16 cases as outliers, which are listed in Table XIII. For this decision, we have considered three main sources of outliers: cases whose NR setup is not appropriate for the purpose of this study, duplicated configurations for which the variations in the final quantities are much larger than the RMSE, and cases that are found to be drastically off the trend of otherwise smooth data sets in any of the one-dimensional plots in our hierarchical fitting procedure. Outliers 1,2,5,16 have rather short orbital evolutions, so that they can be used for ringdown-only studies, but not for our purpose of predicting the final state from initial parameters. For outliers 10-12 and 15-16 we have found large variations in the final-state values for different codes (see Fig. 26). Here we have used only the equivalent SXS configuration, in each case corresponding to longer and presumably more accurate evolutions. The remaining seven outliers have been identified after performing the step-by-step one-dimensional analysis of the data, each deviating so clearly that there must be an underlying systematic problem and not just a statistical fluctuation (in which case they could not be excised from the data set). As an example, we highlight in Fig. 28 three clear GaTech outliers found in the unequal-spin calibration step; however, it was recently confirmed [85] that these three cases should have a negative sign of their final spin, and with this change they are fully consistent with our fits. We note that the overall data quality of the omitted cases may be perfectly adequate for other studies; while for this final-state study, due to good data coverage in the corresponding parameter-space regions and clear global trends in the full data set, the consistency requirements are quite narrow.
We also summarize in Table XIV [9]. For each simulation, we list mass ratio q = m 1 /m 2 , initial spins χ 1 and χ 2 , reference orbital frequency Ω 0 , initial separation D 0 (after junk radiation), eccentricity e, radiated energy E rad (scaled to unit initial mass) and dimensionless final spin χ f .

Appendix B: Fit assessment and model selection criteria
Both while constructing the full ansatz in our hierarchical process, and when selecting the final fit, we rank fits by several standard statistical quantities, which are briefly summarized here for the benefit of the reader.
A basic figure of merit is the root-mean-square-error, X NR (η n , χ 1,n , χ 2,n ) − fit(η n , χ 1,n , χ 2,n ) 2 , (B1) which just checks the overall goodness of fit. One caveat here is that down-weighted NR cases are fully counted in the RMSE, so that a generalized variance estimator using weights can be more useful.
Furthermore, it is important in model selection to penalize models with too many free coefficients, as in principle the RMSE can be made arbitrarily small when the number of coefficients approaches the number of data points. A popular figure of merit for model selection considering the number of coefficients is the Akaike information criterion [45], which intuitively can be understood as weighing up goodness of fit (measured by the maximum log-likelihood L max ) against parsimony. Standard implementations, as the one from Wolfram Mathematica, assume Gaussian likelihoods. A generalization that corrects the AIC for low numbers of observations and reproduces it for large data sets is the AICc: In this work, we always use AICc instead of AIC. A related quantity, similar in form but with a completely different theoretical justification and with subtle differences in practice, is the Bayesian information criterion or Schwarz information criterion [46]: BIC = −2 ln L max + N coeffs ln(N data ) . (B4) Though based on an approximation to full Bayesian model selection (while the AIC is derived from information theory), the BIC in general cannot be interpreted as a direct measure of Bayesian evidence between models. There is much literature on advantages and disadvantages of these two criteria, and several other alternatives exist -see e.g. [86] and references therein. In practice, the BIC tends to impose a slightly stronger penalty on extra parameters than the AIC(c). Both criteria have been criticized [86] for not only penalizing completely extraneous parameters, but also the existence of degeneracies between parameters. However, this is a virtue rather than a problem for our purpose of selecting parsimonious model ansätze.
For all of AIC, AICc and BIC, the model with the lowest value is preferred. Higher than unit differences between two models are generally required to count as significant evidence; [86] quotes ±5 as "strong" and ±10 as "decisive" evidence.
By default, we rank the one-dimensional η and S ansätze by BIC, and apply the same criterion to judge how many ∆χ terms to include in the final 3D model. In general these are not guaranteed to be the best by AICc or RMSE as well, but in practice we check that the three criteria give a very similar ranking of models.
Still, we find that sometimes the best fit by any of these three criteria (RMSE, AICc, BIC) can suffer from one or several parameters being not well constrained. In parameter estimation over large measured data sets, such as the CMB example in [86], this is a desirable feature of model selection, as it allows to assess which physics is actually constrained by the data. However, in our case we are well aware that our data set does not constrain all possible functions over the parameter space, and we are more interested in reporting a wellconstrained model that captures the information in the data set than to dig out weak constraints on possible extensions of that model. So we augment our model selection criteria by considering also the well-constrainedness of each individual fit coefficient, and allow for picking a fit with slightly worse summary statistics if it has better-constrained coefficients; or we drop individual coefficients from a high-order ansatz and reassess the quantitative criteria for that reduced model. BIC for the one-dimensional L orb η, S = 0 fits from Sec. III B. The inset panel is a zoom-up of the top-ranked fits. The tested set of ansätze includes all polynomials from second to seventh order in η and all rational functions of order (i, j), j ≤ i, up to i + j = 6. The preferred ansatz, a rational function of order (3,1) with three free coefficients, is highlighted.
As an example of how e.g. the BIC can guide model selection, we show in Fig. 29 the BIC ranking for the onedimensional L orb η, S = 0 fits from Sec. III B. A plateau of almost constant BIC is made up of several fits with N coeffs ≥ 3, with the more complex fits yielding no additional improvement, so that we choose the simplest fit among this group. Still, even if it had not come up actually top-ranked, as in this case, choosing a low-N coeffs fit from within the high-ranked group would be preferable over some slightly higher-ranked, but less-well-constrained fit.  for the fits using S and χ eff than for that using S , which shows some certainly nonphysical oscillations.
The conclusion is that the hierarchical fitting method is quite robust under a change of effective-spin parametrization, and indeed we would expect full equivalence in the limit of a huge data set with small, completely known NR errors (using appropriately adapted ansätze for each parametrization). With the current data set, S and χ eff perform similarly, while when using S additional high-spin data would be even more important to ensure smooth extrapolation.

Appendix D: Fit uncertainties
The uncertainty of evaluating a fitted quantity Q at a point (η, χ 1 , χ 2 ) can be expressed through prediction intervals [87] Q (η, χ 1 , χ 2 ) ± q t (x, N data − N coeffs ) σ 2 + σ 2 fit , where q t is the student-t quantile for a confidence level x, σ 2 is the error variance estimator from the (weighted) meansquare error of the calibration data under the fit, and σ 2 fit is the standard error estimate of the fitted model, which for a singlestage fit is σ 2 fit = grad t (η, χ 1 , χ 2 ) · C fit · grad (η, χ 1 , χ 2 ) (D2) with the gradient vector grad (η, χ 1 , χ 2 ) of the fit ansatz in the coefficients, evaluated at this point, and the covariance matrix C fit of the fit. Note that Eq. (D1) gives the uncertainty for a single additional observation, as opposed to the narrower confidence interval of the mean prediction, which lacks the σ 2 term. In our hierarchical fitting approach, to propagate the uncertainties from the nonspinning, equal-mass and extreme-massratio limits, we have to assume that the uncertainties in these regimes and that of the final fit are independent, so that we can take the full covariance matrix as a block-diagonal composition of these four contributions. The half width of a prediction interval at confidence x is then q t (x, N data − N coeffs ) σ 2 + σ 2 final + σ 2 η + σ 2 S + σ 2 η=0 . (D3) As these three particular regimes are significantly better constrained than the bulk of the parameter space (which is the main motivation for the hierarchical approach, in the first place), their uncertainty contribution is small, so that the accuracy of this approximation is not critical. The covariance matrices for the fits to both final spin and radiated energy are provided in ASCII format as supplementary material. The estimated variances are σ 2 χ f ≈ 6.225 × 10 −8 for final spin and σ 2 E rad ≈ 2.061 × 10 −8 for radiated energy.