• Command-line interface • Programmatic access
Command-line interface: hadoop fs/hdfs dfs
Copyright By PowCoder代写 加微信 powcoder
Introduction
• There are many other interfaces to HDFS, e.g., Web UI (http://namenode.fqdn:50070)
• The command line is one of the simplest and, to many developers, the most familiar
• CLI entry points:
$ hadoop fs [cmd . . .] → old and deprecated $ hdfs dfs [cmd …] → current standard
$ hdfs dfs -help [cmd …] $ hdfs dfs -usage [cmd …]
Show long help message for given command.
HDFS CLI: Introduction
• These are powerful commands; use -help and -usage when in doubt!
– Ditch the dash (-) for individual commands; e.g., for command -mkdir: $ hdfs dfs -help -mkdir ×
$ hdfs dfs -help mkdir
Show just a short message showing usage syntax for given command.
Abbreviated list of available HDFS CLI: File Uploading & Downloading // OR // config. file HDFS CLI: Directory Creation & List $ hdfs dfs -mkdir books HDFS CLI: Copying, Moving HDFS CLI: Deleting Programmatic access: Common parts Filesystem reference Filesystem reference import org.apache.hadoop.conf.Configuration; • URI-awaresyntax Programmatic access: Basic operations // File already exists. Do something… Input • Read data from a file: Output • Write data to a file: public class FileSystemDoubleCat { • Moving around within a file only supported for reading public class FileSystemDoubleCat { Directory/file renaming and deletion Path dir = new Path(“/path/to/dir”); • Apache Hadoop Documentation – User Guide: 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com
– -mkdir [-p]
– -cp
– -rm [-r|-R] [-f] [-skipTrash]
– -find
– -put
– -copyFromLocal
– -df [
– -du [-s]
– -getmerge [-nl]
– -copyToLocal
– -moveToLocal [-crc]
– -cat
– -text
– -tail [-f]
– -setfacl …
– -chmod [-R] <[octal] mode>
• Upload file(s):
$ hdfs dfs -put/-copyFromLocal
• Download file(s):
$ hdfs dfs -get/-copyToLocal [-ignoreCrc|-crc]
• Download and merge file(s):
$ hdfs dfs -getmerge [-nl]
$ hdfs dfs -copyFromLocal lyrics.txt
Copy to home directory by default
$ hdfs dfs -copyToLocal lyrics.txt lyrics.copy.txt
Copy back to local
$ md5 lyrics.txt lyrics.copy.txt
MD5 (lyrics.txt) = cd81d26a3c5a40e3a497adafb6c88617 MD5 (lyrics.copy.txt) = cd81d26a3c5a40e3a497adafb6c88617
$ hdfs dfs -copyFromLocal ~/lyrics.txt hdfs://namenode:8020/user/username/
$ hdfs dfs -copyFromLocal ~/lyrics.txt /user/username/
URI is specified in
file System
• Create directory:
$ hdfs dfs -mkdir [-p]
$ hdfs dfs -ls .
Directories are treated as metadata and stored by the namenode, not the datanodes
Found 3 items drwx—— drwxr-xr-x -rw-r–r–
– nikos 3 nikos
nikos 0 nikos 0 nikos 630
2015-03-04 08:57 2016-01-21 23:25 2016-01-21 23:13
/user/username/.staging /user/username/books /user/username/lyrics.txt
Replication Factor
file owner & group
size of last modified absolute the file date & time path
You won’t find this attribute in traditional Unix/Linux filesystems
• Copy/move around a single HDFS namespace: $ hdfs dfs -cp
$ hdfs dfs -mv
These commands allows multiple sources as well in which case the destination must be a directory.
$ hdfs dfs -cp /user/username/file1 /user/username/file2 $ hdfs dfs -cp file1 file2 dir
• Delete HDFS file/directory:
$ hdfs dfs -rm [-f] [-r|-R] [-skipTrash]
$ hdfs dfs -rmdir [-ignore-fail-on-non-empty]
• Empty HDFS ‘‘recycle bin’’:
$ hdfs dfs -expunge
Remember !!!
Only deletes non empty directory and files
First things first: Get a reference to the underlying filesystem.
abstract class org.apache.hadoop.fs.FileSystem
→ org.apache.hadoop.fs.LocalFileSystem
→ org.apache.hadoop.fs.DistributedFileSystem
→ org.apache.hadoop.fs.HftpFileSystem → org.apache.hadoop.fs.FTPFileSystem → org.apache.hadoop.fs.s3.S3FileSystem
→ org.apache.hadoop.fs.kfs.KosmosFileSystem
Note: In general you should strive to write your code against the FileSystem abstract class, to retain portability across filesystems
import org.apache.hadoop.fs.FileSystem; Configuration conf = new Configuration();
conf.set(“fs.defaultFS”, “hdfs://namenode.fqdn:8020”); // Not necessary when // appropriate config. files
FileSystem fs = FileSystem.get(conf); […]
fs.close();
• Configuration just a wrapper for java.util.Properties
// exist on the client side
• Configuration’s default constructor loads information from local config. files
• Use an explicit namenode URI if executing code on a remote cluster and no such files have been provided (line 5)
• Use conf.addResource(…) to load non-standard config. (XML) files
• Secondstep:geta“reference”toafile/directory path
– org.apache.hadoop.fs.Path
– Path f = new Path(“file:///user/username/file.txt”);
– Pathf=new Path(“hdfs://localhost:8020/user/username/file.txt”);
– Path f = new Path(“/user/username/file.txt”);
– Path f = new Path(“file.txt”);
Create empty file Create/overwrite file
boolean isCreated = fs.createNewFile(file); // … or …
FSDataOutputStream out = fs.create(file);
// … or …
Do not overwrite file.txt
Directory/File Creation
import org.apache.hadoop.fs.FSDataOutputStream; […]
Path dir = new Path(“/path/to/dir”);
if (fs.mkdirs(dir) == false) // Create directory structure, if not there
// Error creating directory structure. Bail out…
Path file = new Path(“/path/to/dir/file.txt”); // … or …
Absolute path Filename relative to dir !!
Path file = new Path(dir, “file.txt”); if (fs.exists(file))
FSDataOutputStream out = fs.create(file, false); // … or …
FSDataOutputStream out = fs.create(file, new FsPermission(“u=rw,g=r,o-rwx”), true, 1048576, (short)2, 134217728L, null); // ?!?
import java.io.InputStream;
import org.apache.hadoop.fs.FSDataInputStream;
InputStream in = fs.open(file);
// … or …
FSDataInputStream in = fs.open(file);
String str = in.readUTF(); int i = in.readInt();
double d = in.readDouble();
in.close();
• Also check out org.apache.commons.io.IOUtils
import org.apache.hadoop.fs.FSDataOutputStream; […]
FSDataOutputStream out = fs.create(file); out.writeUTF(“1024”); out.writeInt(1024);
out.writeDouble(1.0);
out.close();
public static void main(String[] args) throws Exception {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false); }
finally { IOUtils.closeStream(in);
Here’s the result of running it on a small file:
% hadoop FileSystemDoubleCat hdfs://localhost/user/username/lyrics.txt Mesmerises one of the wedding guests. Stay here and listen to the nightmares of the Sea.
And the music plays on, as the bride passes by. Caught by his spell and the Mariner tells his tale. Driven south to the land of the snow and ice, to a place where nobody’s been.
FSDataInputStream in = fs.open(file);
in.seek(1024); // Go to position 1024 (in bytes) String someString = in.readUTF(); // Read a string in.skip(1024); // Skip next 1024 bytes
int position = in.getPos(); // Get current position in.seek(0); // Go back to start
• HDFS built for streaming access ⇒ Seeking not a good idea (expensive)!
• Seeking not available for writes ⇒ Only append is supported…
FSDataOutputStream out = fs.append(file); out.writeUTF(“some text”);
… but not by all FileSystem subclasses!
public static void main(String[] args) throws Exception {
String uri = args[0];
Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false); in.seek(0); // go back to the start of the file IOUtils.copyBytes(in, System.out, 4096, false);
finally { IOUtils.closeStream(in); }
Here’s a mock result of running it on a small file:
% hadoop FileSystemDoubleCat hdfs://localhost/user/username/lyrics.txt
Mesmerises one of the wedding guests. Stay here and listen to the nightmares of the Sea.
And the music plays on, as the bride passes by. Caught by his spell and the Mariner tells his tale. Driven south to the land of the snow and ice, to a place where nobody’s been.
Mesmerises one of the wedding guests. Stay here and listen to the nightmares of the Sea.
And the music plays on, as the bride passes by. Caught by his spell and the Mariner tells his tale. Driven south to the land of the snow and ice, to a place where nobody’s been.
Path file = new Path(dir, “file.txt”);
// Move file.txt to …/other/dir and rename it
Path dst = new Path(“/path/to/other/dir/otherfile.txt”); fs.rename(file, dst);
// Delete the new file
fs.delete(dst);
// Delete the original directory recursively
fs.delete(dir, true);
– Command-Line Interface:
Further Reading
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
– Architecture:
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html
– Java API:
http://hadoop.apache.org/docs/stable/api/index.html
(look under org.apache.hadoop.fs) • Cloudera Library
– http://www.cloudera.com/content/cloudera/en/documentation.html#ClouderaDocumentation • Google…