Add buffering to SeqfileEventSink

Review Request #1842 - Created June 24, 2011 and submitted

Chetan Sarva
old-flume
FLUME-682
flume
Add output buffering to the SeqfileEventSink with the buffer size configurable in flume-conf.xml but defaulting to 64k. 


  1. A few nits and a request to either update documentation or code about change.
  2. We chatted about potentially making this an option.  If you think it is better this way, can you add info to the releasenotes in a 0.9.5 section about the change in semantics?  (alternately make it an option and then document the option).
    1. I'm not sure that there's really a case for keeping the option in there. I'm leaning towards removing it altogether as it is in the current patch, and simply adding a note as you suggested. Something like: 
      
      The WAL subsystem no longer flushes on every write (FLUME-682).
  3. no @author tags.
    
  4. flume-core/src/main/java/org/apache/hadoop/io/RawSequenceFileWriter.java (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    please fix spacing
  5. 
      
Review request changed
  1. lgtm.  Just one clarification question.
  2. Did you check if the checksumming take a toll on performance significantly?
    1. hadoop's SequenceFile.Writer by default will create a .crc file for each .seq file that is created. This is important for HDFS, but since we don't use these, this Writer implementation exists only to skip this step. The FlushingSequenceFileWriter also skipped .crc creation, so this really just keeps it in line with the current implementation in this regard.
    2. great.
  3. 
      
Loading...