The performances of hydrological models in arid areas are significantly lower than other climates. The reasons are numerous, from the scales involved, to specific processes and the lack of adequate measurements. Effective parameters have been often observed to change between runoff events, limiting the predictive capacity of the models. We look at the problems that can be found in an operational setting and present an analysis to improve the understanding of the errors. Our method characterizes the conditions where the model fails systematically, and the conditions where the parameterization holds between floods. We applied KINEROS2 to 24 years of radar rainfall and streamflow data in 6 arid catchments. A GLUE probabilistic framework is used to characterize model performance, and a method is developed to identify floods with similar calibration. The analysis shows that uninformative conditions are difficult to generalize. A basin-specific analysis can help to identify conditions where the model fails and exclude them from calibration. Despite the large uncertainties, similar catchments display groups of floods with similar parameterization. In some basin we find that it is important to quantify antecedent moisture conditions. Hydrological models show some consistency within limited conditions. These conditions, however, depend on the errors involved, and are site-specific. There is some potential for parameter transfer, but proximity alone might not be enough, and other factors such as mean annual rainfall or storm type, should be taken into account.