Strategy and processing speed eclipse individual differences in control ability in conflict tasks

The construct of response control or response inhibition is one of the cornerstones of modern cognitive psychology, featuring prominently in theories of executive functioning and impulsive behaviour. However, repeated failures to observe correlations between commonly applied tasks have led some theorists to question whether common or overlapping response conflict processes even exist. A challenge to answering this question is that behaviour is multifaceted, with both conflict and non-conflict processes (e.g. strategy, processing speed) contributing to individual differences. Here, we use a cognitive model to dissociate these processes; the diffusion model for conflict tasks (Ulrich et al., 2015). In a meta-analysis of fits to 7 empirical datasets, we observed weak (rho<.05) correlations between tasks in parameters reflecting conflict processing, seemingly challenging a general control construct. However, we saw consistent positive correlations in parameters representing processing speed and strategy. We then use model simulations to evaluate whether correlations in behavioural costs are diagnostic of the presence or absence of common mechanisms of conflict processing. We compare correlations in simulated behaviour in scenarios where we impose correlations in conflict parameters to scenarios in which only non-conflict parameters are correlated. We find that correlations in behaviour are neither necessary nor sufficient evidence for correlations in conflict parameters. Our data provide converging evidence to claims that non-conflict processes contribute substantially to individual differences in conflict tasks and illustrate that correlations between conflict tasks are only weakly informative about common conflict mechanisms.


Strategy and processing speed eclipse individual differences in control ability in conflict tasks
The ability to control our responses in the presence of conflicting information is a core facet of executive functions (Miyake et al., 2000). Response control (sometimes called response inhibition or attentional control) is typically measured in commonly used paradigms such as the Stroop (Stroop, 1935), the Eriksen flanker (Eriksen & Eriksen, 1974), Simon (Simon & Rudell, 1967), and the antisaccade (Hallett, 1978) and stop-signal (Logan, 1994) tasks.
In both theoretical and applied work, it is common to assume either a common underlying response control trait, or some degree of overlap in response control mechanisms underlying different tasks (for a review, see Bari & Robbins, 2013). However, the assumption of common mechanisms has received inconsistent support from correlational studies, with performance in different control tasks showing inconsistent or absent correlations with each other (Aichert et al., 2012;Friedman & Miyake, 2004;Hamilton et al., 2015;Hedge, Powell, & Sumner, 2018b;Ivanov, Newcorn, Morton, & Tricamo, 2011;Stahl et al., 2014;Wager et al., 2005). This has led some theorists to question the value of inhibition as a psychometric construct (Rey-Mermet, Gade, & Oberauer, 2018), which has serious implications for both theoretical work and for the applications of the construct to clinical domains.
Evaluating whether a common and useful 'inhibition' construct exists is obstructed by a key challenge: the way performance is typically measured may be suboptimal for examining individual differences even if the trait does exist (Draheim, Hicks, & Engle, 2016;Hedge, Powell, & Sumner, 2018b;. There is a habit in psychology to use performance in key tasks as proxies for underlying mechanisms, such as memory, attention or control (c.f. Verbruggen, McLaren, & Chambers, 2014). Accordingly, researchers using tasks like the Stroop and flanker routinely use differences in reaction times (RTs) or error rates as an index of an individual's inhibitory ability. But the ingredients to performance are multifaceted, and individual variation does not necessarily come from the same source as the well-studied within-subject effects (Boy & Sumner, 2014). For example, although the main cause of the Stroop effect is conflict, individual differences in the size of the Stroop effect could come from differences in strategy, language processing or even visual acuity (e.g. not wearing your glasses), rather than ability to control conflict.

Strategy and general processing speed contaminate measures of inhibitory ability
We recently conducted a meta-analysis that illustrated the problem of measuring individual differences in inhibitory ability. In the literature, some tasks use RT costs and some use error costs as their main performance measure, and it is generally assumed that subtracting conditions to produce a 'cost' removes speed-accuracy strategy effects. However, across a wide range of tasks, RT costs and error costs taken from the same task show little correlation (r = .17; Hedge, Powell, Bompas, Vivian-Griffiths, & Sumner, 2018). In other words, if we were to rank individuals from best to worse in inhibitory ability based on their Stroop cost in RTs, we would come to a very different ordering than if we used the Stroop cost in errors. To some extent, low correlations between RT costs and error costs are to be expected following evidence that they have sub-optimal reliability, which attenuates correlations (Enkavi et al., 2019;Hedge, Powell, & Sumner, 2018b;Paap & Sawi, 2016).
However, this does not fully account for the low and inconsistent pattern, with significant negative correlations sometimes observed between the two purported measures of the same ability. We explain this in the framework of evidence accumulation models (e.g. (Brown & Heathcote, 2008;Roger Ratcliff, 1978). We assume that individuals differ in at least two dimensions. The first is their ability to select the correct response based on the information.
Individuals who are 'better' at inhibiting conflicting information should show both smaller RT costs and error costs, leading to a positive correlation. The second is their strategy, reflecting how much information they wait for before they make a decision. Individuals who are more cautious produce larger RT costs and smaller error costs, leading to negative correlations. Critically, the traditional approach of subtracting conditions does not remove strategy effects, which can mask individual differences in inhibitory ability (Hedge, Powell, Bompas, et al., 2018).
Theorists have also argued that general processing speed confounds the measurement of response control (Miller & Ulrich, 2013). A re-analysis of several factor analytic studies observed that individual differences in conflict tasks can be accounted for by a general processing speed factor, without need for a separate inhibition factor (Jewsbury, Bowden, & Strauss, 2016; see also Friedman & Miyake, 2017;Rey-Mermet et al., 2019). In an evidence accumulation framework, we have shown that greater efficiency in general information processing produces smaller RT costs and errors costs, thus manifesting in the same way as greater inhibitory ability (Hedge, Powell, & Sumner, 2018a).
Taken together, the literature paints a challenging picture for assessing whether common mechanisms of inhibition or conflict processing exist. The size of an individual's RT and/or error cost in a given task reflects some unknown combination of their ability to overcome conflict, their strategy, and other processing abilities. The relative contribution of these processes to behaviour will differ between tasks, or variants of the same task (Hedge, Powell, Bompas, et al., 2018;Unsworth, Schrock, & Engle, 2004). This leads us to the question of where (if at all) there are common mechanisms underlying individual differences in performance in response control tasks. To reframe the question, if common mechanisms of inhibition or conflict processing did exist, would we know? For this question, a cognitive modelling approach provides a potentially valuable window into the processes underlying conflict tasks.

Overview of the paper
Our main aim in the first part of this paper is to apply a cognitive model (the diffusion model for conflict tasks, DMC; Ulrich et al., 2015) to multiple empirical datasets in order to decompose behaviour into constituent processes. This allows us to examine correlations in parameters representing conflict mechanisms and key non-conflict mechanisms (e.g.
processing speed) that have been implicated in the literature. We adopt a meta-analytic approach to maximise power and integrate across datasets. To pre-empt the main findings, we observe no correlation in the model parameters representing conflict processes. We do observe consistent correlations in model parameters representing non-conflict processes (e.g. strategy, general processing speed), providing converging evidence for previous claims (e.g. Jewsbury et al., 2016).
In the final part of the paper, we use the unique ability of a cognitive model to simulate data from a known theoretical position. We ask: if our process model is appropriate to what extent are performance measures diagnostic of the presence or absence of common mechanisms of conflict processing? Here, we use the DMC to generate data for two hypothetical tasks with a known correlation in parameters of conflict processing. We find that any emergent correlation in performance measures is heavily attenuated by variance in nonconflict processes such as strategy. Further, we observe correlations in performance of a similar magnitude when we impose correlations in non-conflict processes as we do when conflict processes are correlated. The implication of this is that the degree of correlation between conflict tasks is not diagnostic of shared or specific conflict processing: shared mechanisms could be masked, while behavioural correlations could be driven, for instance, by common strategies across tasks.

The diffusion model for conflict tasks.
The DMC (Ulrich et al., 2015) is a mathematical model of choice RT behaviour in conflict tasks, and an extension of the drift diffusion model (DDM) (Roger Ratcliff, 1978), a general model of choice RT behaviour. The standard DDM assumes that individuals sample noisy evidence from their environment over time until a criterion level of evidence is reached for one of the two response options. The three main parameters describe the average rate of evidence accumulation (drift rate), the amount of evidence required (boundary separation), and the duration of motor and perceptual processes (non-decision time). Differences in difficulty between conditions are normally captured by differences in drift rate, with lower drift rates for stimuli that are less discernible.
The standard DDM assumes that the average rate of evidence accumulation within a trial is constant, albeit subject to random noise. This makes it unable to capture data patterns that distinguish tasks with automatic response activation from non-conflict choice RT tasks.
First, errors in conflict tasks are typically fast in the incongruent condition (Gratton, Coles, Sirevaag, Eriksen, & Donchin, 1988;Ridderinkhof, 2002), interpreted to reflect the automatic activation of the prepotent response. Second, while mean RTs in incongruent trials are typically slower than mean RTs on congruent trials in conflict tasks, the magnitude of this effect can vary, decrease, and even reverse when comparing the slower quantiles of the correct and incorrect RT distributions in the Simon task (De Jong, Liang, & Lauber, 1994).
This behaviour is interpreted to reflect increasing influence of inhibition over time (or decay; Hommel, 1994), which acts to diminish and sometimes reverse the early influence of the automatic activation.
The DMC ( Figure 1A-C) accounts for this by assuming that the task-irrelevant feature (e.g. the flankers in a flanker task) is processed via a fast and automatic routeit initially receives a strong activation which is reduced over time. Concurrently, the task-relevant feature (the central arrow in a flanker task) is processed via a slower, deliberate decision route. The controlled route is captured by a drift rate parameter that is held constant over congruency conditions in the DMC. This reflects the assumption that the processing of the task relevant property of the stimulus is equivalent across all conditions. The drift rate parameter in the DMC can therefore be interpreted as general processing efficiency. The automatic route is implemented as a rescaled gamma function, which captures the assumption that pre-potent stimulus features influence the early phase of the decision processes more than the later phase ( Figure 1D).
The DMC takes inspiration from the Activation-Suppression hypothesis (De Jong et al., 1994;Kornblum, 1994;Ridderinkhof, 2002), which posits that the automatic activation is removed through active suppression. However, the DMC is agnostic about what drives the reduction in the influence of automatic activation and has no explicit parameter to represent inhibitory ability. Instead, the ability to overcome conflict is implicit in the susceptibility to pre-potent response activation (the amplitude it reaches), or the speed at which automatic activation is removed/decays. The maximum value of the automatic activation is defined by an amplitude parameter, and the time that the maximum value is reached is defined by a scale parameterwe hereafter refer to the scale parameter as the time-to-peak (following Ulrich et al., 2015) 1 . The gamma function also has a shape parameter, but following Ulrich et al.
(2015; see also White, Servant, & Logan, 2017), we fixed this to a constant value for all individuals. Therefore, individuals with more efficient inhibition would be expected to have either a lower amplitude and/or a shorter time to peak as these are the parameters that should capture individual differences in conflict processing ( Figure 1E and 1F).
1 Note that the time at which the peak amplitude is reached is only equal to the scale parameter when the shape parameter is fixed to 2 (Ulrich et al., 2015), which was our case. It is defined by: = * ( ℎ − 1) Figure 1. Schematic of the diffusion model for conflict tasks (Ulrich et al., 2015). A) The decision process is implemented as noisy accumulation of evidence to either the upper (b) or lower (-b) boundary, here representing the correct and incorrect responses respectively. Nondecision time (Ter) refers to sensory and motor processes, which occur before and after the decision phase. B) The average rate of evidence accumulation is determined by two underlying process. The drift rate of the controlled process (μc) represents the efficiency of processing the task relevant property of the stimulus (e.g. the central arrow in a flanker task).
The amplitude (A) and time-to-peak (tau) describe a rescaled gamma function, which represents the automatic activation and subsequent removal of automatic activation (e.g. the processing of the flanking arrows). Here the automatic activation is depicted for incongruent Increasing the amplitude parameter leads to increased mean RT costs (higher average values of the delta functions on the y-axis). Increasing the time-to-peak (blue vs. black line) produces more positive going delta slopes. Note the correspondence between the shape of the delta functions and the shape of the automatic activation that produce them ( Figure 1D).
We note that our approach here is one of model application, rather than model validation or comparison (Crüwell, Stefan, & Evans, 2019). We assume that there is value in using the DMC as a theoretical framework and use its parameters to inform our question about whether common mechanisms exist. We base this choice on previous demonstrations that evidence accumulation models can inform our understanding of individual differences in cognitive abilities in the context of the confounds we have mentioned (Hedge, Powell, Bompas, et al., 2018;R Ratcliff, Thompson, & McKoon, 2015). We selected the DMC over other evidence accumulation models because it has been shown to account for data patterns observed across a range of conflict tasks that other models cannot (Ulrich et al., 2015;White et al., 2017). Testing whether alternative models better account for the patterns of data is beyond the scope of this paper. However, we consider this issue in the discussion, informed by how well the model captures the data patterns we observe.

Rationale
The first question is whether model parameters can reveal correlations between conflict tasksevidence for common mechanismsthat traditional measures are less able to detect. We answer this question by performing a meta-analysis of 12 task pairs taken from 7 datasets including new and previously published data (Hedge, Powell, Bompas, et al., 2018;Hedge, Powell, & Sumner, 2018b;Hedge, Vivian-Griffiths, Powell, Bompas, & Sumner, 2019;Whitehead, Brewer, & Blais, 2019). We fit the DMC to each task and participant separately to extract model parameters.
Our criteria for selecting these datasets was first that they include some combination of the flanker, Simon and Stroop tasks. We focus on these tasks because they have analogous conflict effects and lend themselves to the DMC framework (c.f. Ulrich et al., 2015). We consider this task selection with regards to theoretical taxonomies of tasks in the general discussion. Our second criterion was that they have sufficient trial numbers. In particular, the accurate estimation of parameters in evidence accumulation models requires participants to make errors in sufficient number.

Datasets
We provide a brief description of each dataset, and Table 1 summarises the key information of each. For full methodological details we refer to Supplementary Material A and the original papers. We draw particular attention to Dataset 3 (Hedge, Powell, Bompas, et al., 2018), which consists of two variants of the Simon task. In one variant, congruent and incongruent trials were randomly intermixed (as is standard for the Simon task), while in the other congruent and incongruent trials were presented in separate blocks (a common format for tasks such as the antisaccade). Normally, when low correlations between conflict tasks have been observed in the literature we do not know if this is because the mechanisms underlying conflict resolution are task specific, or because of our inability to detect common mechanisms in the presence of other differences in processing between tasks. But in Dataset 3, the tasks are more closely matched, at least in terms of their surface features. This allows for a more focused test of our ability to detect common mechanisms (for a similar approach, see Snyder, Rafferty, Haaf, & Rouder, 2019).  Note. *The authors refer to this as a Simon task, noting that it can also be thought of as a spatial Stroop. We refer to it as a spatial Stroop to distinguish it from the format of the Simon task in datasets 1 & 3. See main text for details.
Dataset 1: Flanker and Simon. Dataset 1 was collected for the current purpose (although it was included in a meta-analysis for a different question; Hedge, Powell, Bompas, et al., 2018). Participants (N=50) completed both flanker and Simon tasks in a single session.
In the flanker task, participants responded to the direction of a centrally presented arrow which could be flanked vertically by arrows pointing in the same direction (congruent), the different direction (incongruent), or straight lines (neutral). In the Simon task, participants responded to the colour of a circle presented either on the same side as the response hand (congruent), the opposite side (incongruent), or centrally (neutral). Participants alternated between blocks of each task throughout the session, with the starting task counterbalanced across participants. Prior to testing, participants performed a practice block of 24 trials for each task. A schematic of the tasks used in datasets one to four is shown in Figure 2. & 3), participants respond to the colour of the stimulus and ignore the location. In the Stroop task (dataset 2), participants respond to the colour of the font and ignore the written word.

Dataset 2: Flanker and Stroop.
These data were originally collected to assess the test-retest reliability of response control tasks (Hedge, Powell, & Sumner, 2018b). In two studies with identical procedures, participants (combined N = 103) completed four tasks (flanker, Stroop, stop-signal and go/no-go). We do not model the stop-signal and go/no-go tasks here as they are not suited to the DMC framework. The flanker task was as described in Dataset 1. We used a four choice Stroop task, in which participants responded to the font colour (red, blue, green, yellow) of a centrally presented word. The word could either match the font colour (congruent), refer to a different colour used in the response set (incongruent), or be a non-colour word. Participants completed all the tasks in two sessions, three weeks apart. We combine the data from both sessions and both studies for the analyses reported here. Full details can be seen in Hedge et al. (2018b).

Dataset 3: Intermixed and blocked Simon task. Dataset 3 was previously collected
to test the prediction that the correlation between RT costs and error costs would be more positive when congruent and incongruent trials were randomly intermixed compared to separate blocks of congruent and incongruent trials (Hedge, Powell, Bompas, et al., 2018). In a single session, participants (N=102) completed blocks consisting of congruent trials only (two blocks), incongruent trials only (two blocks), and both congruent and incongruent intermixed (four blocks). A mixed-trial block always occurred between congruent only and incongruent only blocks, with the starting block counterbalanced across participants.

Dataset 4: Flanker and Stroop.
Dataset 4 was collected to examine the test retest reliability of the speed-accuracy trade-off induced by instructing participants to emphasise either speed, accuracy or both speed and accuracy (Hedge et al., 2019). Participants completed the flanker and Stroop tasks as described in datasets 1 and 2, in two sessions separated by four weeks. For each session and tasks, participants completed 4 blocks each for speed, standard (both speed and accuracy) and accuracy conditions. Participants were shown instructions at the beginning of each block to emphasise the relevant performance dimension.
We combine the data from both sessions for the analyses here. In the spatial Stroop task, participants responded to the identity of a directional word (right, left, up or down) which could be presented in a congruent location (e.g. "up" above central fixation) or an incongruent location (e.g. "below" to the right of central fixation). The datasets differed in the ratio of congruent to incongruent trials (Datasets 5 and 7 were 50/50, Dataset 6 consisted of a 25/75 congruent), and Datasets 6 and 7 allowed for featurerepetitions and both feature-repetitions and target-distractor contingencies respectively. The manipulations were designed to modulate the size of conflict adaptation effects and are not directly relevant to our goal here. Participants completed eight blocks of 128 trials (Datasets 5 and 6) or six blocks of 120 trials (Dataset 7), of which the first two or one respectively were considered practice blocks. In Dataset 7 and the practice blocks of Dataset 6, trials that produced an error or an RT >3000ms were repeated at the end of a block. We include all trials in our analysis here as we previously did not observe discernible practice effects in comparable tasks, but we did observe a benefit to having more trials overall (Hedge, Powell, & Sumner, 2018b;Supplementary Material D).

Data analysis
We applied the same data analysis procedure to all datasets. We excluded participants who were below 60% accuracy in any task in each dataset. We used a relatively lenient inclusion criterion in order not to limit variance between participants. In Supplementary material B we reran our analyses with a cut-off of 80% and this did not alter our conclusions.
We removed RTs that were less than 100ms, and greater than the median plus three times the median absolute deviation for each individual in each condition.

Model fitting
To fit the DMC to experimental data, we adapted the approach of White et al. (2017).
We estimated seven parameters of the DMC separately for each participant in each task. The parameters representing conflict processing were the amplitude of automatic activation (A for congruent trials, -A for incongruent trials) and the time to peak automatic activation (tau).
The non-conflict decision parameters are boundary separation (b), drift rate of the controlled process (µc), and the shape parameter of the beta distribution used to represent starting points of the accumulation process (α). Finally, non-decision time is implemented as a gaussian distribution with parameters for the mean (Ter) and variability (TerSD). In Datasets 3 and 4, we estimated additional boundary separation parameters to capture the experimental manipulations. In Dataset 4, we estimated three separate boundary separation values to capture strategic differences between blocks in which we emphasised either speed, accuracy, or both speed and accuracy. We calculated the between-task correlation in boundary separation under each instruction condition, and entered all three into our meta-analysis. In Dataset 3 (intermixed vs. blocked Simon task), we derived separate boundary separation estimates for congruent-only and incongruent-only blocks. As our mixed-trial Simon variant produced a single boundary separation estimate, we averaged the two values from the blocked variant to obtain a single correlation for this parameter.
For datasets 1, 2 and 4, we also had data from a neutral condition, which we included in the fitting with the amplitude of the automatic activation fixed to zero. For each participant within each task only the amplitude parameter provides the difference between congruent, neutral and incongruent trials; all other parameters were constrained to be equal across conditions. As with Ulrich et al. (2015), the diffusion constant/within-trial noise (σ) was fixed to 4. We fixed the shape parameter of the automatic activation function to 2 for all tasks, following Ulrich et al. (2015).
We accuracy-coded our data, so that the upper and lower response boundaries correspond to thresholds for correct and incorrect responses respectively. Note that the DMC is a model of a two choice task, whereas some of our datasets contained four-choice tasks.
Multi-choice tasks can be accommodated by accuracy coding, which, while not ideal, allowed us to interpret all the datasets within a common framework. Correct and incorrect RTs from congruent, neutral (where available), and incongruent conditions were separately binned into quantiles. Correct RTs were binned into five quantiles (.1, .3, .5, .7, .9) for each condition separately. The same approach was applied for incorrect RTs in each condition when the total number of errors in that condition ≥ 10. When between 5 and 10 errors were made, three quantiles were used (.3, .5, .9) for incorrect RTs. If fewer than 5 errors were made, we fit the median RT of the errors. We calculated the deviance (-2 log-likelihood) between observed and simulated quantiles, which was minimised with a Nelder-Mead simplex (Nelder & Mead, 1965) implemented in the fminbnd function in Matlab. We constrained the search such that all free parameters were positive, and the shape of the starting point distribution was greater than one.
We first fit the data using 5000 parameter sets generated from a uniform distribution within the minimum and maximum values given in Table 2 (based on White et al., 2017), with simulations consisting of 5000 trials per condition. We then took the 15 best parameter sets resulting from this initial search, and submitted each of those to the simplex algorithm, in which we simulated 10,000 trials per condition at each iteration. The simplex was reinitialised 3 times to avoid local minima. After the process was completed, we took the single best fitting parameter set for each individual. This process took approximately 30-40 hours per individual per task, and was performed on Cardiff University Brain Research Imaging Centre's (CUBRIC) high performance computer cluster. We are the first to apply the DMC to a Stroop task, and we noticed during preliminary examination of our data that our fitting routine would typically converge to values outside our initial search space for the non-decision time, time-to-peak, and shape of the starting distribution parameters. Unlike the flanker and Simon tasks, participants did not make fast errors in our Stroop task (see Supplementary material D; see also  we refit the Stroop data using a higher range of starting parameters, noted in Table 2. It is plausible that interference in the Stroop task has a later time course compared to the flanker task or Simon task, since semantic word processing is expected to be slower than processing of location or simple visual symbols. This is supported by evidence from event-related potentials (ERPs) in separate studies, where ERPs diverge earlier between conditions in the flanker task (Kałamała, Szewczyk, Senderecka, & Wodniecka, 2018) compared to manual Stroop tasks (Liotti, Woldorff, Perez, & Mayberg, 2000). We also used the higher range of non-decision time when fitting Datasets 5 to 7, as these datasets typically had slower RTs.

Meta-analysis of correlations
We calculated Spearman's rho correlations for each pair of tasks and parameter (e.g. the correlation between the amplitude parameter from the flanker task in dataset 1 with the amplitude parameter from the Simon task in dataset 1). This produced 13 correlations for each parameter (15 for boundary separation). These correlations were then meta-analysed using a multilevel random effects meta-analysis, implemented in the metafor package in R (R Core Development Team, 2017;Viechtbauer, 2010). The multilevel approach allows us to account for the possibility that correlations taken from the same dataset (as with datasets 4 to 7) may be more similar to each other than correlations taken from independent datasets.
We also calculated the I 2 statistic for each parameter (c.f. Viechtbauer, 2019), which is interpreted to represent the heterogeneity of the observed effects. An I 2 of 0% would indicate that all the variability in the observed effect size estimates is due to sampling error, rather than 'real' differences between datasets and task pairs. We interpret I 2 values of 25%, 50% and 70% as low, moderate, and high levels of heterogeneity respectively (Higgins, Thompson, Deeks, & Altman, 2003).

Meta-analysis of model parameters. Our main question concerns the correlations
between tasks for the model parameters ( Figure 3). We report the results of this analysis first, before considering factors that might moderate our conclusions, such as the reliability of the data and model fits. If we assume that factors such as general processing speed and strategy confound behavioural measures of 'inhibition', then separating these out using a cognitive model may reveal correlations in the parameters representing conflict processingthe amplitude and time-to-peak of automatic activation. Figure 3 shows the weighted average correlation for each parameter, along with the individual correlations for each pair of tasks.
These correlations correspond to less than 1% of common variance on average, providing no support for the hypothesis of a common mechanism of conflict processing between tasks. The low I 2 values suggest this to be the case consistently across all datasets. We again draw particular attention to Dataset 3, which did not deviate from the trend of low correlations in amplitude (r=.04) and time-to-peak (r=-.07) despite consisting of the same Simon task performed with intermixed and blocked trials.  These parameters represent the efficiency of processing (i.e. general processing speed) and response caution respectively, two processes highlighted as confounds in the literature (e.g. Hedge, Powell, Bompas, et al., 2018;Jewsbury et al., 2016). Finally, we also observed significant positive correlations in the mean and variability of non-decision time, as well as in start point variability. The model parameter correlations therefore provide good evidence for commonality in the mechanisms underlying general performance in conflict tasks, but not for the conflict and inhibition processes themselves.
Behavioural performance. Given the number of tasks/datasets, we report the means and standard deviations of RT and error rates in Supplementary Material C. In all tasks, we observed the expected pattern of increased error rates and slower RTs in incongruent trials relative to congruent trials. The four choice tasks (Datasets 5 to 7 and the Stroop task in Datasets 2 and 4) typically produced slower RTs (~500 to 800ms) compared to the two choice tasks (~350 to 500ms).
For completeness, we applied the same meta-analytic approach we applied to our We also stress that a motivation for applying a cognitive model to this data is that we assume that RT costs can be contaminated by non-conflict decision mechanisms, therefore we do not interpret these effects in behavioural costs as specifically reflecting common mechanisms of inhibition. We return to this in simulations below.

Reliability.
We quantified the reliability for our behavioural measures by calculating the ICC(2,K) on odd and even trials for each measure and task. These are reported in Supplementary Material C. We meta-analysed the reliabilities in the same way as our performance measures, and focus on the average reliability here. The reliability of mean error rates (ICC=.88) and mean RT (ICC=.94) were excellent (Landis & Koch, 1977). As expected from previous examinations of the reliability of key indicators from these tasks (Enkavi et al., 2019;Hedge, Powell, & Sumner, 2018b;Paap & Sawi, 2016), the reliability of the RT costs and error costs was lower than mean RTs and error rates, though they showed nominally good levels of reliability (ICC = .64 and .65).
Due to the computational time required to fit the diffusion model for conflict tasks to all our datasets (approximately 10 years in total across all processing cores), it was unfeasible to obtain reliability estimates for our model parameters in every dataset. However, we have previously examined the four-week test-retest reliability of the DMC fit to Dataset 4 (Hedge et al., 2019). As we would expect behaviour within a session to be more reliable than behaviour with a four-week separation, these can be treated as a lower bound for the reliability of the DMC parameters for our current purposes. Focusing on the conflict parameters, we observed moderate reliability for the amplitude parameter (ICC = .55 and .47 in the flanker and Stroop task respectively), though the reliability of the time-to-peak parameter was poor (ICC = -.04 and .19). These suggest that our ability to observe between task correlations in conflict mechanisms here may be limited to the amplitude parameter. If the DMC is an appropriate model for these tasks, then the best fitting parameters should reproduce both individual differences in the data, as well as capture key data patterns.

Model fits and sanity checks.
We evaluated the model fits by calculating Pearson correlations for accuracy and RT quantiles (25 th , 50 th , 75 th ) of the observed data against data simulated using the best fitting model parameters for each participant (Voss, Voss, & Lerche, 2015). RTs for correct and incorrect responses were evaluated separately. We illustrate this with incongruent trials from two tasks in Figure 4, which are representative of the range of fits we observed. In addition, we evaluated the extent to which the fits could qualitatively reproduce the conditional accuracy functions and delta plots in the observed data. We elaborate on the fits in Supplementary Material E for those interested in using the DMC and focus here on the implications for our interpretations of the model parameters. Focusing first on individual differences, the model fits generally captured accuracy well. The minimum correlation between observed and simulated accuracy for any task/dataset were r=.73 and r=.86 for congruent and incongruent trials respectively. Correct RTs were also captured well across all RT quantiles for congruent (minimum r=.85) and incongruent trials (minimum r=.91). The reproduction of incorrect RT quantiles showed more variability, ranging from .61 to .96 for incongruent trials. This is to be expected as error RTs are based on fewer trials, so the estimates are noisier. Notably, the model tended to systematically underestimate RTs for tasks that had slower RTs overall, particularly for errors (Stroop, Datasets 5 to 7; see Figure 4).
A consequence of the underestimation of slow incongruent RTs was the underestimation of the RT cost in tasks with slower (correct) RTs. We elaborate on this behaviour in Supplementary material E and consider the theoretical implications of these patterns in the discussion. We opted to include all the datasets in our meta-analysis despite this observation. We reasoned that the pattern of fast errors in most tasks was reflected in the model fits, which indicates that they are capturing the timing and strength of conflict effects to some degree. Further, the strong positive correlations in accuracy and RT quantiles indicate that individual differences are being captured by the model. The consistency of the conflict parameter correlations observed in our meta-analysis, indicated by the low I 2 values, suggests that our conclusions are not dependent on the inclusion of particular datasets.
Summary of empirical data. Overall, we observe weak or no correlation between tasks in DMC parameters representing conflict processing. However, we do observe consistent correlations in model parameters reflecting non-conflict decision processes. We see small but significant correlations in RT costs, though these could also be driven by common variance in strategy and processing speed across tasks. A critical step towards interpreting these effects is to understand the source(s) of individual differences in these measures.

Simulation study: The evidential value of performance measures in conflict tasks
We might interpret the weak correlations between measures of conflict processing in our datasets as an indication of independent mechanisms underlying each task. However, a domain-specific account of conflict control is difficult to apply to Dataset 3, where the intermixed and blocked variants of the Simon task share many surface level characteristics.
There is evidence in the literature that the blocking manipulation we used may affect the processing demands of a task. For example, it has been suggested that intermixing trials in the antisaccade task places greater demands on working memory and attentional control because the task goal can vary from trial to trial (Unsworth et al., 2004). For our current purposes, the key point is not that processing is unaffected by changes to the task, it is that we observe little or no correlation in conflict measures when we try to match the tasks more closely. This is true in both model parameters and behavioural costs, despite the latter showing good reliability in this dataset. If we assume that the mechanisms of conflict processing are still shared between the two Simon task variants, then this finding suggests that it is difficult to isolate individual differences in conflict processing among other processes that contribute to behaviour.
Despite the absence of correlations in model parameters in our empirical data, we did observe a small but significant positive correlation in RT costs, as well as a similar correlation in error costs. We cautioned against the interpretation that these are evidence of common mechanisms of conflict processing, as we have previously shown that performance costs do not isolate ability in a specific cognitive domain (Hedge, Powell, Bompas, et al., 2018;Hedge, Powell, & Sumner, 2018a; see also Draheim et al., 2016;Miller & Ulrich, 2013). However, this is not to say that they lack any information. Researchers may use difference scores with the knowledge that they are not perfect, but under the assumption that they carry some information about individual differences in conflict. One advantage of a cognitive model is that they allow us to evaluate this possibility through simulation. In order to draw meaningful conclusions about common mechanisms of conflict processing from a measure it should have two properties in plausible scenarios. First, we want to know if a correlation in performance a necessary consequence of common mechanisms of conflict processing. In other words, when we impose a correlation in conflict parameters, we should see a correlation in RT costs and/or error costs. Second, we want to know if a correlation in performance measures is sufficient evidence of common conflict mechanisms. If we accept that RT costs and error costs can be contaminated by other factors, then at least they need to reflect common conflict mechanisms more than non-conflict mechanisms. Together, these properties might allow us to distinguish between a world in which common mechanisms of conflict exist and one in which they do not.
We conducted a set of simulation studies to assess these criteria. We imposed correlations in conflict model parameters (amplitude and/or time-to-peak) between two tasks to represent a common mechanism for conflict. We then compared this to an alternative, in which there are no correlations in conflict parameters, but the non-conflict decision parameters (drift rate and boundary separation) were correlated instead. We tested how this underlying structure would emerge in RT costs and error costs. Our simulations have the additional benefit that we are not limited by measurement noise due to low trial numbers or reliability, so this approach provides a theoretical upper limit for the effect sizes we could expect to see in real data.

Method
We based our parameter ranges on a previous parameter recovery study (White et al., 2017), which themselves were based on previous studies that had applied the DMC (Servant, White, Montagnini, & Burle, 2016;Ulrich et al., 2015). White et al. observed high correlations between simulated and recovered parameters (r>.93 for all parameters when shape is held constant), so we can be confident that these ranges produce discriminable variation in behaviour.
We simulated multiple scenarios that varied on three dimensions. The first dimension reflected different hypothetical tasks. We simulated hypothetical Simon, flanker, and Stroop tasks by varying the average value of the time-to-peak parameter to match what we observed in our model fits. We did this because this parameter has previously accounted for differences in behavioural patterns between tasks (Ulrich et al., 2015), and we reasoned that these different dynamics may affect the correlations observed in RT cost and error costs. For simplicity, we used the same means and standard deviations for the parameters in both simulated tasks within each scenario (see Table 3). We reasoned that correlations should be larger in two version of the same task (e.g. two flanker tasks) as compared to different tasks, as they should produce more similar patterns of behaviour. We also used the same mean and variance for the other parameters across all tasks to aid comparisons. The second dimension that we varied was which parameters were correlated. The first three represented scenarios in which the two tasks had common mechanisms of conflict processing. These corresponded to a correlation in the amplitude parameter only, the time-topeak parameter only, and both the amplitude and the time-to-peak parameters. In the fourth scenario, the conflict parameters were uncorrelated, and we instead imposed a correlation in drift rate and boundary separation; the non-conflict decision parameters. This fourth scenario allows us to evaluate whether correlations in behaviour can arise in the absence of common conflict mechanisms. We assumed no correlation (r=0) for all parameters other than those named in each scenario.
The third dimension that we varied was the magnitude of the correlation that we imposed (r = .3, .5 and .7). We did this in order to evaluate whether RT costs and error costs were sensitive to changes in correlation in the underlying mechanisms.
For each scenario and effect size, we simulated datasets for 2000 'participants' comprised of 5000 congruent and 5000 incongruent trials. This is more trials than would typically be run in an empirical study, but it allows us to minimise the impact of noise on our estimates. We expect behavioural correlations with lower trial numbers would be smaller.
Parameters were generated from a multivariate normal distribution using Matlab's mvnrnd function. This allows for the generation of two variables with specified means, standard deviation, and covariance (correlation). We derived the standard deviations by dividing the range of the uniform distributions used by White et al. (2017) by six, in order to obtain a similar range. In other words, the upper limit of the uniform distribution used by White et al. corresponds to 3 standard deviations above the mean of the normal distribution used in our simulation. For simplicity we did not include variability in non-decision time, and we fixed the shape parameter for automatic activation to 2, as in our empirical fits and Ulrich et al. .

Results and discussion
Performance correlations are not necessary evidence for common mechanisms of conflict processing. Spearman's rho correlations between performance measures calculated from the two simulated tasks are shown in Figure 5. First, we evaluated whether correlations in performance are necessary given that there are correlations in the conflict parameters. The white/pale sections in the first three scenarios illustrate that this condition is not met. It was possible to observe no correlation in both RT costs and error costs in the presence of very strong (r=0.7) correlations in the time-to-peak parameter.
The correlation in RT costs generally increased as the underlying correlation in the amplitude parameter increased and were largest in the scenarios where correlations were imposed in both the amplitude and time-to-peak parameters. However, the behavioural correlations were heavily attenuated in some cases. For example, whereas a correlation of rho=.52 was observed in RT costs in the Simon task when the correlation in both amplitude and time-to-peak was very strong (r=.7), the corresponding correlation in the Stroop scenario was small (rho=.21). This occurs because independent variance in the non-conflict parameters masks the effect of the conflict parameters. The general pattern of the underlying correlation being underestimated in behavioural costs could lead researchers to incorrect conclusions about the presence or absence of common mechanisms. Note that most correlations in RT and error costs predicted in the first three scenarios are below what is traditionally considered moderate (.3), except when the correlation in amplitude is very large (.7), or both the amplitude and time to peak parameters show strong (>.5) correlations. Based on our empirical fits, where the largest correlation we saw in conflict parameters in any dataset was rho=.19, we do not expect underlying correlations in currently used tasks to be strong. Figure 5. Spearman's rho correlations between performance costs calculated from two simulated datasets using the diffusion model for conflict tasks. The strength of the betweentask correlation in the conflict parameter(s) is given in the "Simulated effect size" column.
The columns to the right of this show the between-task correlations in the simulated error and RT costs respectively. The correlation between other model parameters (boundary separation, drift rate and non-decision time) was set to 0 in the first three scenarios. In the fourth scenario, the correlation in conflict parameters was set to zero, and the non-conflict parameter correlations were set to the magnitudes observed in our meta-analysis. We used the same parameter ranges for both tasks within each scenario. For example, the 'Simon' column shows the correlations between two versions of a Simon task. Note that the size of the correlations in the fourth scenario are comparable to, and in some cases exceed, those observed in the first three scenarios.
Performance correlations are not sufficient evidence for common mechanisms of conflict processing. Next, we evaluated whether it is possible to observe correlations in RT costs and error costs in the absence of common mechanisms of conflict processing. In the fourth scenario, the mechanisms underlying conflict processing are independent (r=0), but we imposed correlations in parameters representing strategy and general processing efficiency.
The key observation here is that the correlations can be similar to, and even exceed, those we see in the first three scenarios. This illustrates that non-conflict processes (e.g. strategy, processing speed) can create apparent correlations in measures of 'inhibition' when the mechanisms of conflict processing are in fact independent. When we combine this with the observation that correlations in conflict parameters do not always translate into behaviour, we can conclude that correlations in performance costs are neither necessary nor sufficient to infer there are common underlying conflict mechanisms.
The magnitude of the correlations we observe in the fourth scenario may surprise some readers, though they are in line with previous simulations (Hedge, Powell, Bompas, et al., 2018;Hedge, Powell, & Sumner, 2018a). The reason is that both RT costs and error costs are correlated with drift rate and boundary separation, and we impose a correlation on both these parameters simultaneously here, so they have a strong impact on behaviour. We show the correlations between the behavioural measures and parameters in Supplementary Material F.

Caveats and considerations.
A key inference from our simulations is that individual differences in non-conflict decision processes could mask individual differences in conflict processing in performance measures. In our first three scenarios, our simulated individuals varied in boundary separation and drift rate, but this variation was uncorrelated between tasks, and therefore adds 'noise' to the performance measures. The extent of noise is dependent on the standard deviations used to generate the parameters (see Table 3). For example, if we used smaller standard deviations for boundary separation, we would see stronger correlations in our performance measures as a function of the conflict parameters.
The standard deviations we chose were based on previous simulations (White et al., 2017) and empirical observations (Ulrich et al., 2015), however, we observed greater variance in several parameters in the fits to our data (see Supplementary Material D). To check the robustness of our conclusions, we conducted an additional simulation in Supplementary Material F. As in our main simulation, we simulated data for two tasks with a large betweentask correlation (r=.7) in the amplitude and time-to-peak parameters, but now generating parameter sets using the means and standard deviations we observed in the DMC fits to our flanker and Simon data. The resulting between-task correlations in simulated performance measures did not exceed those reported for the analogous scenarios in Figure 5. Thus, our interpretation that variation in conflict processing parameters has a relatively small effect on behaviour is not dependent on our initial assumptions about the parameter ranges.
A second consideration is that we simulated the scenarios of common conflict and common non-conflict mechanisms in isolation. When we assumed that the amplitude and time-to-peak parameters were correlated, we assumed that drift rate and boundary separation were uncorrelated and vice-versa. In reality these are not mutually exclusive -it is possible that both conflict and non-conflict processes are correlated in some scenarios, both of which contribute to positive correlations in performance costs. However, the challenge faced by researchers remains the same: The magnitude of a correlation between RT costs and error costs cannot be interpreted as a direct measure of the correlation in conflict processing or 'inhibition'.
We reiterate that our simulations represent scenarios where the underlying variance is not restricted (because the parameters can be recovered well; White et al., 2017), where the variance is similar between the two tasks, and where there is minimal noise in the behavioural measures due to the large number of simulated trials.
Summary of simulations. The general patterns from our two simulation studies highlight the difficulty in examining correlations between different response control tasks.
Measures of performance do not uniquely reflect the mechanisms of interest in our assumed model. Further, small to moderate correlations in performance are plausible even when very strong (r=.7) correlations are imposed on the mechanisms of interest.

Discussion
The overarching questions we address here are: is there a common mechanism of conflict processing underlying performance across 'inhibition' tasks and, if there were, would we be able to detect it from RT and error costs? Our data and simulations suggest the presence or absence of correlations across conflict tasks is only weakly informative as to whether common conflict control mechanisms underlie performance. In a meta-analysis of model parameters fit to multiple empirical datasets, we observed a general pattern of weak or no correlations in measures of conflict processing. This pattern persists even when we examine two variants of the same task, which we assume share more common elements of processing than tasks from different conflict domains.
Finally, our simulations demonstrate that detecting correlations that can be specifically attributed to conflict processing would be difficult, as parameters reflecting response caution and general processing efficiency contribute substantially to performance measures. These confounding parameters add noise if they are uncorrelated between tasks, potentially leading us to conclude that conflict processing mechanisms are relatively independent. Alternatively, if these general processes are correlated between tasksas they seem to be in the datasets presented abovethey drive correlations in performance measures and could thus mislead researchers searching for common conflict mechanisms.

Should we stop thinking about individual differences in 'inhibition'?
The construct of response control or response inhibition has been a core component of cognitive theorising for at least several decades (Logan, Cowan, & Davis, 1984;Miyake et al., 2000), and one that has been heavily implicated in neuropsychological disorders and brain dysfunction (Bari & Robbins, 2013;Chambers et al., 2009).  pose the question of whether inhibition is a useful psychometric construct, citing low and inconsistent correlations reported in the literature and their own data. Instead, they suggest that the ability to resolve interference is task specific, challenging the often-made assumption that performance on any given response control task can be interpreted in a broader context.
Our findings are consistent with this position, but highlight that it is very difficult to draw any conclusions about inhibition constructs at all from either absence or presence of behavioural correlations.
One clear finding from our meta-analysis was that we consistently observed little correlation in conflict-related model parameters. However, since we still could not detect correlation between conflict parameters for the same task performed blocked or intermixed (Dataset 3), we would conclude that it is simply too difficult to recover meaningful information about conflict from correlating tasks (Rouder, Kumar, & Haaf, 2019). Together with our simulations, this suggests that individual differences in caution, strategy and overall processing speed swamp any real individual differences in conflict-control in both behavioural costs and model fitting.
The answer to the question of whether we should stop thinking about inhibition as a general construct likely depends on why the researcher is interested in it. Researchers who are interested in answering theoretical questions about the structure of executive functions (e.g. Friedman & Miyake, 2004) often administer multiple conflict tasks, use latent variable approaches to account for measurement error, and small but non-zero correlations can be theoretically meaningful. Research in this area is likely to continue, seeking improvements to task design and measurement (Draheim, Tsukahara, Martin, Mashburn, & Engle, 2019;Rey-Mermet, Gade, Souza, von Bastian, & Oberauer, 2019;. In contrast, to some researchers inhibition tasks are seen as one of many tools that can be used to understanding individual differences in outcomes such as cognitive development (Carver, Livesey, & Charles, 2001;Dahlin, 2011), neuropsychological conditions (Hutton & Ettinger, 2006), or impulsivity (Skippen et al., 2019). Researchers in these contexts may use a single task, implicitly assuming it is representative of inhibition measures in general. Therefore, large correlations between tasks are a prerequisite for interpreting performance as a measure of general inhibitory ability. Our data do not support such a generalisation. Instead, researchers in these areas might be better served by focusing on tasks that are sensitive to the domain of interest (c.f. Hutton & Ettinger, 2006;.

Common non-conflict processes in conflict tasks
Our meta-analysis revealed consistent evidence for moderate to strong correlations in drift rate and boundary separation, which represent the efficiency of controlled processing and strategy/caution respectively. These parameters are notable because our simulations show that these non-conflict processes contribute substantially to RT costs and error costs (see also; Hedge, Powell, Bompas, et al., 2018;Hedge, Powell, & Sumner, 2018a;Miller & Ulrich, 2013). These findings also converge with evidence from factor analytic studies that performance in inhibition tasks can be (at least partly) accounted for by processing speed (Jewsbury et al., 2016;Rey-Mermet et al., 2019), or goal maintenance and implementation (Friedman & Miyake, 2017;Kane & Engle, 2003). Overall, it appears that there are common mechanisms underlying performance in inhibition tasks, though they are not unique to conflict processing.
Our findings and approach contribute to the discussion in several ways. First, multiple studies have assumed that strategy may confound the measurement of individual differences and take steps to control for it (e.g. Draheim et al., 2016;Rey-Mermet et al., 2019). However, they do not measure response caution and examine whether it correlates across tasks as we do here. Second, the finding that general processing speed is sufficient to account for individual differences in inhibition tasks in factor analytic studies is partly based on a failure to derive a unique inhibition factor (Rey-Mermet et al., 2019). By using a model to dissociate and quantify the efficiency of controlled processing, captured by the drift rate parameter, we can provide positive evidence for common mechanisms.
Finally, though we draw parallels between the drift rate parameter and latent perceptual/processing speed factors identified in factor analytic studies (Hedden & Yoon, 2006;Jewsbury et al., 2016), it is not a given that they refer to the same underlying ability. A perceptual speed task might involve comparing the size of two letter strings to determine which is longest, with performance measured by the number completed in a fixed time limit (Hedden & Yoon, 2006). A latent variable is then derived from behaviour across multiple tasks assumed to measure the same construct. In contrast, a cognitive model attempts to dissociate latent processes that contribute to behaviour within a task. From an evidence accumulation model perspective, individual differences in this 'perceptual speed' factor could also be driven by some combination of drift rate, boundary separation, and non-decision time.
These two approaches to capturing latent psychological processes are not mutually exclusive, and some studies have used diffusion model parameters in a factor analysis in place of behavioural measures (e.g. Schmiedek et al., 2007). Such an integration may a useful approach to overcome the impurity of behavioural measures that we evidence here.

Alternative models
Our approach makes several assumptions about the mechanisms underlying performance in response control tasks. We emphasise that we do not know the true model for empirical datait is possible that the DMC is a mischaracterisation of the mechanisms of response control. We chose the framework of evidence accumulation models because they have previously offered valuable insights into individual differences in choice RT behaviour (e.g. Hedge, Powell, Bompas, et al., 2018;R Ratcliff et al., 2015). Further, we chose the DMC specifically because we needed a common framework for all tasks, whereas some alternative models invoke task specific mechanisms (White, Ratcliff, & Starns, 2011).
However, it is reasonable to ask whether we would have reached different conclusions had we used a different evidence accumulation model, or a different family of models altogether.
It is common for evidence accumulation models to show a high degree of mimicry.
Different models can often reproduce the same data patterns even though they make different assumptions (Donkin, Brown, Heathcote, & Wagenmakers, 2011;Teodorescu & Usher, 2013). There are alternative sequential sampling models that have been applied to response control tasks, which involve extensions from standard diffusion or accumulator models (Bompas, Campbell, & Sumner, 2019;Bompas, Hedge, & Sumner, 2017;Bompas & Sumner, 2011;Dillon et al., 2015;Hubner, Steinhauser, & Lehle, 2010;Noorani & Carpenter, 2013;Weigard, Heathcote, & Sripada, 2019;White et al., 2011). Many of these extensions are designed to capture the observation that errors to incongruent stimuli are typically fast in tasks such as the flanker. They do this by assuming that there is a nonlinearity in the evidence accumulation process; information from the prepotent stimulus feature contributes more to the early period of the decision than it does to the late period. If we were to examine the evidence for common mechanisms in a different model, then we would inevitably look at correlations in the parameters responsible for this non-linearity. We expect that this would not lead to different conclusions than we arrive at here because we still face the challenge that these mechanisms contribute only in part to individual differences in behaviour, relative to parameters representing strategy or overall processing speed. These conclusions are not specific to the DMC, as we have shown they also arise from the driftdiffusion model (Hedge, Powell, & Sumner, 2018a), nor are they specific to evidence accumulation models (Miller & Ulrich, 2013;Pachella, 1974).
Outside of the accumulation model framework, different modelling approaches have been applied to conflict tasks. Perhaps most notable is the Stroop task, for which there are models based in a connectionist framework (e.g. Cohen, Dunbar, & McClelland, 1990), reinforcement learning (Verguts & Notebaert, 2009), and others (for a review, see Chuderski & Smolen, 2016). These models do not necessarily conflict with an evidence accumulation model account, and they sometimes share similar assumptions (Hubner et al., 2010;van Maanen & van Rijn, 2007). Here, we started with the working assumption that all tasks could be explained using a common framework. Instead, there may be value in using different models that are tailored to the assumptions underlying each task and examining correlations in conceptually related parameters across different models. For our current purposes, alternative models would still need to deal with the difficulty in distinguishing individual differences in conflict processing amongst the other processes that contribute to behaviour.
An alternative model could possibly provide better quantitative fits to some of our data than the DMC does here. Indeed, our fits reveal some data patterns that may challenge the assumptions of the DMC (see Supplementary Material E). In particular, in our implementation, the time-to-peak parameter couples the speed at which automatic activation peaks with the speed at which it is removed. This led to our fits erroneously predicting negative delta functions in data that had fast errors and slow RTs. It could be argued that this is an unfair test of the DMC, as it is designed as a model of two-choice behaviour, and the data patterns that produced poorer fits were from four choice tasks. The DMC reproduced the data patterns from our two-choice tasks well and was able to capture individual differences in all datasets to a degree. However, we are not the first to observe an underestimation of the conflict effect in slower RTs with the DMC (Hübner & Töbel, 2019). Notably, they also observed negative going delta functions in the flanker task when the onset of the flankers preceded the onset of the target (Hübner & Töbel, 2019). This suggests transient activation elicited by the conflicting stimulus feature is a plausible account of both the flanker and Simon tasks, though additional flexibility may be required to model it within a common framework.

Alternative perspectives on response control
To some theoretical perspectives, it may not be surprising that parameters derived from different tasks and modalities show weak correlations. Starting with Friedman and Miyake's (2004; see also Miyake et al., 2000) influential work, many studies have used factor analysis to distinguish different subtypes of response control tasks (though earlier work had made conceptual distinctions (e.g. Nigg, 2000). The three factors identified were inhibition of prepotent responses (antisaccade, Stroop, and stop-signal tasks), resistance to distractor interference (flanker, word naming, shape matching) and resistance to proactive interference (Brown-Peterson, AB-AC-AD, cued recall). We did not base our task selection on these previous taxonomies as they do not consistently replicate (c.f. . In recent revisions of their model of executive functioning, Miyake and Friedman (2017) have suggested that performance in inhibition tasks may be best explained by a more general construct, such as the ability to maintain and implement task goals. However, it could be suggested that we observe low correlations here because some of our task pairs (e.g. flanker, Simon) span different subfactors of the 2004 framework. This interpretation would not account for the low correlations we observe between more closely related task (Stroop, spatial Stroop), or the blocked and intermixed Simon task variants in Dataset 3.
Beyond the individual differences context, Egner and colleagues (Egner, 2008;Egner, Delano, & Hirsch, 2007) have presented evidence for a dissociation between mechanisms underlying stimulus-based and response-based conflict. Stimulus-based conflict occurs in tasks such as the Stroop and flanker, in which participants are required to respond to a relevant stimulus feature (e.g. the font colour in the Stroop) and ignore an irrelevant feature (e.g. the written word). Here, conflict arises from the relevant and irrelevant stimulus features being on the same dimension (word meaning). In contrast, so-called response-based conflict in the Simon task arises from an incompatibility between the stimulus irrelevant feature (location) and the response mapped to the relevant feature (e.g. responding with the right hand to a blue circle presented on the left side of the screen). However, stimulus colour and stimulus location do not conflict in a shared feature space. Egner et al. (2007) show in an fMRI study that stimulus-based and response-based conflict modulated activity in parietal and premotor cortex respectively. Further, differences in stimulus properties, task relevance, and response modality may all modulate the underlying mechanisms responsible for processing (Bompas et al., 2017;Bompas & Sumner, 2011). Using models such as the DMC to decompose performance into underlying components might reveal common principles across tasks without necessitating common neural mechanisms.

Summary and conclusions
Drawing conclusions from individual differences in response control tasks, and, conversely, attempting to directly measure inhibition ability is a difficult task. This difficulty is an obstacle both to theory development, and to the study of neuropsychiatric disorders and socially problematic behaviours. Overall, our findings suggest that individual differences in commonly used tasks may only be weakly informative about whether common mechanisms of conflict processing exist across tasks. We observe consistent evidence that individual differences in non-conflict decision processes contribute to behavioural measures of 'inhibition' (RT costs and error costs), and that they have the power to change conclusions. We urge researchers to account for these confounds where possible.