Looking beyond general metrics for model comparison