mrjob v0.2.5 released

Jimmy Retzlaff jimmy at retzlaff.com
Sat Apr 30 13:20:44 EDT 2011


What is mrjob?
-----------------------
mrjob is a Python package that helps you write and run Hadoop Streaming jobs.
mrjob fully supports Amazon's Elastic MapReduce (EMR) service, which
allows you to buy time on a Hadoop cluster on an hourly basis. It also
works with your own Hadoop cluster.
Some important features:
 * Run jobs on EMR, your own Hadoop cluster, or locally (for testing).
 * Write multi-step jobs (one map-reduce step feeds into the next)
 * Duplicate your production environment inside Hadoop
 * Upload your source tree and put it in your job's $PYTHONPATH
 * Run make and other setup scripts
 * Set environment variables (e.g. $TZ)
 * Easily install python packages from tarballs (EMR only)
 * Setup handled transparently by mrjob.conf config file
 * Automatically interpret error logs from EMR
 * SSH tunnel to hadoop job tracker on EMR
 * Minimal setup
 * To run on EMR, set $AWS_ACCESS_KEY_ID and $AWS_SECRET_ACCESS_KEY
 * To run on your Hadoop cluster, install simplejson and make
sure $HADOOP_HOME is set.
More info:
 * Install mrjob: python setup.py install
 * Documentation: http://packages.python.org/mrjob/
 * PyPI: http://pypi.python.org/pypi/mrjob
 * Discussion: http://groups.google.com/group/mrjob
 * Development is hosted at github: http://github.com/Yelp/mrjob
What's new?
-------------------
v0.2.5, 2011年04月29日 -- Hadoop input and output formats
 * Added hadoop_input/output_format options
 * You can now specify a custom Hadoop streaming jar (hadoop_streaming_jar)
 * extra args to hadoop now come before -mapper/-reducer on EMR, so
 that e.g. -libjar will work (worked in hadoop mode since v0.2.2)
 * hadoop mode now supports s3n:// URIs (Issue #53)


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /