According to the documentation on Incremental Bulk Loads:
Incremental bulk loads can add data to existing tables concurrently with other table operations, with better performance than put operations. This type of bulk load makes use of write-ahead log files.
Tables are available for client operations, such as put, get, and scan operations, during incremental bulk loads.
You can use incremental bulk loads to ingest large amounts of data to an existing table. Tables remain available for standard client operations such as put, get, and scan while the bulk load is in process. A table can perform multiple incremental bulk load operations simultaneously.
Got a couple of questions...
- Could someone explain how the WAL is being used?
- Does the data in the incremental bulk load have to be in sort order?
If I had to guess, I'd say that the bulk load had to be in sort order and then is split based on row key to the correct region server (err... tablet server?) and then bulk written to the appropriate memstore? But I'm probably wrong.