write all Inputfomrates In hadoop
Recommend to check out What are the most common Input Formats in Hadoop? - Quora .
hadoop apche spark mapreduce
In Hadoop, Input files store the data for a MapReduce job. Input files which stores data typically reside in HDFS. Thus, in MapReduce, InputFormat defines how these input files split and read. InputFormat creates Inputsplit.
Most common InputFormat are:
FileInputFormat- It is the base class for all file-based InputFormat. It specifies input directory where data files are present. FileInputFormat also read all files. And, then divides these files into one or more InputSplits.
TextInputFormat- It is the default InputFormat of MapReduce. It uses each line of each input file as the separate record. Thus, performs no parsing.
Key- byte offset.Value- It is the contents of the line, excluding line terminators.Example content of file- is john may which katty
Key- 0Value- is john may which kattyKeyValueTextInputFormat- It is similar to TextInputFormat. Hence, it treats each line of input as a separate record. But the main difference is that TextInputFormat treats entire line as the value. While the KeyValueTextInputFormat breaks the line itself into key and value by the tab character (‘/t’).
Key- Everything up to tab character.Value- Remaining part of the line after tab character.Example content of file- is -> john may which katty
Key- isValue- john may which kattyTab character “->”
SequenceFileInputFormat- It is the InputFormat which reads sequence files. Key & Value- Both are user-defined.
Follow the link to learn more about InputFormat in Hadoop
Retrieving data ...