[フレーム]
BT

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

View an example

We protect your privacy.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Log In
or

Don't have an InfoQ account?

Register
  • Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
  • Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
  • Save articles and read at anytimeBookmark articles to read whenever youre ready.

Topics

Choose your language

InfoQ Homepage News Mahout to Get Self-Optimizing Matrix Algebra Interface with Pluggable Backends for Spark and Flink

Mahout to Get Self-Optimizing Matrix Algebra Interface with Pluggable Backends for Spark and Flink

This item in japanese

Nov 21, 2014 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.
Get in touch

At the recent GOTO conference in Berlin, Mahout committer Sebastian Schelter outlined recent advances in Mahout's ongoing effort to create a scalable foundation for data analysis that is as easy to use as R or Python.

Schelter said the main goal is to provide a simple Scala based DSL (domain specific language) similar to the matrix notation in languages like R, but with the possibility to store large matrices distributed over a cluster and run computations in parallel.

The resulting library, Schelter said, will provide local and distributed matrices that can be used with one another seamlessly. The Mahout team designed the library such that it is not tied to a specific platform but will rather have a pluggable backend that can target different platforms.

Currently, development for Apache Spark has progressed most, but Apache Flink, another incubating next generation Big Data platform, will also be considered, Schelter said.

One key aspect of this new architecture is also that it is possible to treat operations differently based, for example, on the matrix sizes involved, leading to far reaching optimization potential. The main design goal according to Schelter is to allow data scientists to write scalable code without having to worry too much about parallelization. A demo page gives a first impression of the resulting interface.

Apache Mahout originally started as a project to implement a number of machine learning algorithms on top of Hadoop. It covers algorithms for classification, clustering, recommendation, and learning document models. So far these algorithms were based on Hadoop and the MapReduce computation model, instead of the more flexible models like the one provided by the Apache Spark project. Apache Spark has begun to develop its own machine learning library called mllib, which covers fewer algorithms than Mahout right now but claims on their project homepage to be much faster due to improvements like moving computations into memory and better support for iterative algorithms.

The move of Mahout away from relying on MapReduce alone comes at a time when there is a wide variety of alternative approaches to distributed computing.

Google itself has started to explore alternative computation schemes already some time ago, including Percolator, which allowed for more incremental updates of the Google search data base, and Pregel, a system built for distributed execution of graph computations. Pregel has in turn lead to open source projects like Apache Giraph and Stanford's GPS.

A further alternative toolbox which provides a wide variety of distributed implementations of machine learning algorithms is GraphLab developed at the Carnegie Mellon University.

Rate this Article

Adoption
Style

Related Content

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

BT

AltStyle によって変換されたページ (->オリジナル) /