Scalable machine learning algorithms on a big data infrastructure