It is clear that TextInputFiles are read by RecordReader for every new line delimeter. Can anyone clarify on what basis a SequenceInputFiles such as binary or compressed files are read by RecordReader?
A sequence file is a file format that contains a header followed by a series of 1 or more blocks. The header defines the key type, value type, and compression used, if any, for the file data. The key-value records are bundled into the blocks. The block delimiters are called "markers", and the size of a block is tunable.
Sync markers are used to mark a logical boundary of a record or a set of records. This is used to read records properly across block splits in HDFS/MapR-FS. As you mentioned, the line record reader of the text input format looks for lines terminating in '\n'. In case of sequence files there are no such record terminators, so a 'sync' marker is used instead to look for 'ends' of records so that they may be read back by the sequence file record reader correctly.
In what context? Spark? MapR Streams? Something else?
SequenceFileInputFormat is designed to deal with sequence files. They are binary files and stores sequences of binary key-value pairs. To read data from sequence files as the input to Map Reduce, you should use this map reduce input format class. The keys and values are determined by the sequence file (as metadata is stored inside sequence file), and you need to make sure that your map input types correspond. For example, if your sequence file has IntWritable keys and Text values, then the map signature would be Mapper<IntWritable, Text, K, V>, where K and V are the types of the map output keys and values.
Map Reduce Input formats - TeckStory
Retrieving data ...