1. Machine-Learning-Tutorials
  2. spark
Notebook

This notebook was prepared by Donne Martin. Source and license info is on GitHub.

HDFSΒΆ

Run an HDFS command:

In [ ]:
!hdfs

Run a file system command on the file systems (FsShell):

In [ ]:
!hdfsdfs

List the user's home directory:

In [ ]:
!hdfsdfs-ls

List the HDFS root directory:

In [ ]:
!hdfsdfs-ls/

Copy a local file to the user's directory on HDFS:

In [ ]:
!hdfsdfs-putfile.txtfile.txt

Display the contents of the specified HDFS file:

In [ ]:
!hdfsdfs-catfile.txt

Print the last 10 lines of the file to the terminal:

In [ ]:
!hdfsdfs-catfile.txt|tail-n10

View a directory and all of its files:

In [ ]:
!hdfsdfs-catdir/*|less

Copy an HDFS file to local:

In [ ]:
!hdfsdfs-getfile.txtfile.txt

Create a directory on HDFS:

In [ ]:
!hdfsdfs-mkdirdir

Recursively delete the specified directory and all of its contents:

In [ ]:
!hdfsdfs-rm-rdir

Specify HDFS file in Spark (paths are relative to the user's home HDFS directory):

In [ ]:
data = sc.textFile ("hdfs://hdfs-host:port/path/file.txt")

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /