FLUME-7: Flume node gets into a bad sate if the last good config gets set to a bad state.

Review Request #249 - Created July 1, 2010 and submitted

Jonathan Hsieh
old-flume
flume
HenryR, phunt
Due to a previous CDH-1444 (fix flume node heartbeat hangs with bad configuration) we changed the semants of starting a logical node to force opens into the driver thread by making open lazy. This broke an invariant -- a unopenable logical node (due to source or sink) would be recorded as the last good config. If a config failed, the default behavior was to revert to previous last good -- which was the broken configuration. There was no throttling, so this would eventaully eat up file handles if they were not cleaned and spin up a lot of threads.

We fix this problem by changing the node into ERROR state on any failure (open, close, next, append) and not attempt to restart with the previous configuration. To also codify the termination behavior by adding another test.

We also cleaned up some of the error reporting messages along the way.
Renamed LogicalNodeTest -> TestLogicalNode
Added many tests to TestLogicalNode that exercise and define behavior for each error condition.
  1. lgtm
  2. 
      
  1. Looks good - is there any need to have lastGoodCfg any more? We don't roll back to it, and it might actually represent a failed configuration. I'd be inclined to remove it unless it's playing a significant role.
    1. It is still needed because it keeps the configuration and the last updated time/version number that is part of a comparison used to trigger installing a new configuration.
  2. For errors, I'm still in general all about logging the entire exception, but your call.
  3. 
      
Loading...