Apache Mahout

Open-source machine learning algorithms

This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)

A major contributor to this article appears to have a close connection with its subject. It may require cleanup to comply with Wikipedia's content policies, particularly neutral point of view. Please discuss further on the talk page. (February 2021) (Learn how and when to remove this message)

icon

This article relies excessively on references to primary sources . Please improve this article by adding secondary or tertiary sources.
Find sources: "Apache Mahout" – news · newspapers · books · scholar · JSTOR (February 2021) (Learn how and when to remove this message)

(Learn how and when to remove this message)

Apache Mahout

Developer	Apache Software Foundation
Initial release	7 April 2009; 16 years ago (2009年04月07日)^[1]

Stable release	14.1 / 7 October 2020; 5 years ago (2020年10月07日)^[2]
Repository	Mahout Repository
Written in	Java, Scala
Operating system	Cross-platform
Type	Machine Learning
License	Apache License 2.0
Website	mahout.apache.org

Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily on linear algebra. In the past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark.^[3]^[4] Mahout also provides Java/Scala libraries for common math operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; a number of algorithms have been implemented.^[5]

Features

[edit ]

Samsara

[edit ]

Apache Mahout-Samsara refers to a Scala domain specific language (DSL) that allows users to use R-Like syntax as opposed to traditional Scala-like syntax. This allows user to express algorithms concisely and clearly.

valG=B%*%B.t-C-C.t+(ksidotksi)*(s_qcrosss_q)

Backend agnostic

[edit ]

Apache Mahout's code abstracts the domain-specific language from the engine where the code is run. While active development is done with the Apache Spark engine, users are free to implement any engine they choose- H2O and Apache Flink have been implemented in the past and examples exist in the code base.

GPU/CPU accelerators

[edit ]

The JVM has notoriously slow computation. To improve speed, "native solvers" were added which move in-core, and by extension, distributed BLAS operations out of the JVM, offloading to off-heap or GPU memory for processing via multiple CPUs and/or CPU cores, or GPUs when built against the ViennaCL library.^[6] ViennaCL is a highly optimized C++ library with BLAS operations implemented in OpenMP, and OpenCL. As of release 14.1, the OpenMP build considered to be stable, leaving the OpenCL build is still in its experimental POC phase.

Recommenders

[edit ]

Apache Mahout features implementations of Alternating Least Squares, Co-Occurrence, and Correlated Co-Occurrence, a unique-to-Mahout recommender algorithm that extends co-occurrence to be used on multiple dimensions of data.

History

[edit ]

Transition from Map Reduce to Apache Spark

[edit ]

While Mahout's core algorithms for clustering, classification and batch based collaborative filtering were implemented on top of Apache Hadoop using the map/reduce paradigm, it did not restrict contributions to Hadoop-based implementations. Contributions that run on a single node or on a non-Hadoop cluster were also welcomed. For example, the 'Taste' collaborative-filtering recommender component of Mahout was originally a separate project and can run stand-alone without Hadoop.

Starting with the release 0.10.0, the project shifted its focus to building a backend-independent programming environment, code named "Samsara".^[7]^[8]^[9] The environment consists of an algebraic backend-independent optimizer and an algebraic Scala DSL unifying in-memory and distributed algebraic operators. Supported algebraic platforms are Apache Spark, H2O, and Apache Flink.^{[citation needed ]} Support for MapReduce algorithms started being gradually phased out in 2014.^[10]

Release history

[edit ]

Release History
Version	Release date	Notes
0.1	2009年04月07日
0.2	2009年11月18日
0.3	2010年03月17日
0.4	2010年10月31日
0.5	2011年05月27日
0.6	2012年02月06日
0.7	2012年05月16日
0.8	2013年07月25日
0.9	2014年02月01日
0.10.0	2015年04月11日	Samsara DSL
0.10.1	2015年05月31日
0.10.2	2015年08月06日
0.11.0	2015年08月07日
0.11.1	2015年11月06日
0.11.2	2016年03月11日
0.12.0	2016年04月11日	Added Apache Flink engine
0.12.1	2016年05月19日
0.12.2	2016年06月13日
0.13.0	2017年04月17日
0.14.0	2019年03月07日	Source only (no binaries)
14.1	2020年10月07日

Developers

[edit ]

Apache Mahout is developed by a community. The project is managed by a group called the "Project Management Committee" (PMC). The current PMC is Andrew Musselman, Andrew Palumbo, Drew Farris, Isabel Drost-Fromm, Jake Mannix, Pat Ferrel, Paritosh Ranjan, Trevor Grant, Robin Anil, Sebastian Schelter, Stevo Slavić.^[11]

References

[edit ]

^ "Apache Mahout: First release 0.1 released".
^ "Apache Mahout: Scalable machine learning and data mining" . Retrieved 6 March 2019.
^ "Introducing Apache Mahout". ibm.com. 2011. Retrieved 13 September 2011.
^ "InfoQ: Apache Mahout: Highly Scalable Machine Learning Algorithms". infoq.com. 2011. Retrieved 13 September 2011.
^ "Algorithms - Apache Mahout - Apache Software Foundation". cwiki.apache.org. 2011. Archived from the original on 22 December 2013. Retrieved 13 September 2011.
^ "Extending Mahout Samsara to GPU Clusters". Archived from the original on 3 November 2020. Retrieved 29 October 2020.
^ "Mahout-Samsara's In-Core Linear Algebra DSL Reference". Archived from the original on 2 August 2016. Retrieved 29 February 2016.
^ "Mahout-Samsara's Distributed Linear Algebra DSL Reference". Archived from the original on 2 August 2016. Retrieved 29 February 2016.
^ "Mahout 0.10.x: first Mahout release as a programming environment". www.weatheringthroughtechdays.com. Archived from the original on 9 October 2016. Retrieved 29 February 2016.
^ "MAHOUT-1510 ("Good-bye MapReduce")".
^ "Apache Committee Information".

External links

[edit ]

Official website Edit this at Wikidata

v t e The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airavata Airflow Allura Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Calcite Camel CarbonData Cassandra Cayenne CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Iceberg Ignite Impala Jackrabbit James Jena JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces Mynewt NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Taverna
Other projects	Batik FOP Ivy Log4j
Attic	Apex AxKit Beehive iBATIS Click Continuum Deltacloud Etch Giraph Hama Harmony Jakarta Marmotta MXNet ODE River Shale Slide Sqoop Stanbol Tuscany Wave XML
Licenses	Apache License
Category

Retrieved from "https://en.wikipedia.org/w/index.php?title=Apache_Mahout&oldid=1323979732"