What Are We Really Testing in Mutation Testing for Machine Learning? A Critical Reflection