Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-503

Implement erasure coding as a layer on HDFS

    XMLWordPrintableJSON

Details

  • New Feature
  • Status: Closed
  • Major
  • Resolution: Fixed
  • None
  • 0.21.0
  • contrib/raid
  • None
  • Reviewed
  • Hide
    This patch implements an optional layer over HDFS that implements offline erasure-coding. It can be used to reduce the total storage requirements of DFS.
    Show
    This patch implements an optional layer over HDFS that implements offline erasure-coding. It can be used to reduce the total storage requirements of DFS.

Description

The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system can be reduced. Keeping three copies of the same data is very costly, especially when the size of storage is huge. One idea is to reduce the replication factor and do erasure coding of a set of blocks so that the over probability of failure of a block remains the same as before.

Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.

My opinion is to discuss implementation strategies that are not part of base HDFS, but is a layer on top of HDFS.

Attachments

  1. raid1.txt
    162 kB
    Dhruba Borthakur
  2. raid2.txt
    174 kB
    Dhruba Borthakur

Issue Links

blocks

Improvement - An improvement or enhancement to an existing feature or task. HDFS-600 Support for pluggable erasure coding policy for HDFS

  • Major - Major loss of function.
  • Resolved
is depended upon by

Improvement - An improvement or enhancement to an existing feature or task. HDFS-582 Create a fsckraid tool to verify the consistency of erasure codes for HDFS-503

  • Major - Major loss of function.
  • Resolved
relates to

New Feature - A new feature of the product, which has yet to be developed. HDFS-7285 Erasure Coding Support inside HDFS

  • Major - Major loss of function.
  • Resolved

Improvement - An improvement or enhancement to an existing feature or task. MAPREDUCE-1837 Raid should store the metadata in HDFS

  • Major - Major loss of function.
  • Resolved

Improvement - An improvement or enhancement to an existing feature or task. MAPREDUCE-2036 Enable Erasure Code in Tool similar to Hadoop Archive

  • Minor - Minor loss of function, or other problem where easy workaround is present.
  • Resolved

Activity

People

dhruba Dhruba Borthakur
dhruba Dhruba Borthakur
Votes:
0 Vote for this issue
Watchers:
39 Start watching this issue

Dates

Created:
Updated:
Resolved: