Apache DataFu™

Getting Started

DataFu Spark Docs

DataFu Pig Docs

DataFu Hourglass Docs

Community

Apache Software Foundation

Apache DataFu Pig - Guide

Hashing

MD5

The MD5 hash of a string can be computed with the MD5 UDF.

For example:

define MD5 datafu.pig.hash.MD5();
--input: "hello, world!"
data_in = LOAD 'input' as (val:chararray);
data_out = FOREACH data_in GENERATE MD5(val) as val;
-- produces: (fc3ff98e8c6a0d3087d515c0473f8677)
DUMP data_out;

The function can instead output base64 by passing 'base64' to the constructor. The default is 'hex' for hexadecimal.

define MD5 datafu.pig.hash.MD5('base64');

SHA

A SHA hash can be computed with SHA. The output will be in hexadecimal.

define SHA datafu.pig.hash.SHA();
--input: "hello, world!"
data_in = LOAD 'input' as (val:chararray);
data_out = FOREACH data_in GENERATE SHA(val) as val;
-- produces: (7509e5bda0c762d2bac7f90d758b5b2263fa01ccbc542ab5e3df163be08e6ca9)
DUMP data_out;

By default this uses SHA-256. The constructor also takes an optional parameter for the particular SHA algorithm to use. To use SHA-512 instead:

define SHA512 datafu.pig.hash.SHA('512');
Apache Feather
Copyright © 2011-2025 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
Apache DataFu, DataFu, Apache Pig, Apache Hadoop, Hadoop, Apache, and the Apache feather logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and other countries.

AltStyle によって変換されたページ (->オリジナル) /