-
-
flume-core/src/main/java/com/cloudera/flume/agent/LivenessManager.java (Diff revision 1) What happens when we reach the bound here?
-
flume-core/src/main/java/com/cloudera/flume/agent/LivenessManager.java (Diff revision 1) I think I prefer passing FlumeConfigData instead of string src string snk here. This will update the version numbers properly from the master's clock.
-
flume-core/src/main/java/com/cloudera/flume/agent/LivenessManager.java (Diff revision 1) This seems like it does a double config now because the config times on this heartbeat are going to be newer (even though it is the same config). This is safer than before because at least this time it is in the same thread.
-
flume-core/src/main/java/com/cloudera/flume/agent/LivenessManager.java (Diff revision 1) We probably need make interrupted the only way out -- catch runtime/throwables report then and then loop until interrupted.
-
-
-
It concerns me that test tests are now timing dependent and depend on sleep. The previous version was setup so that we could interleave different actions or parts of actions in a way to test racy conditions. This is no longer the case.
FLUME-706: Flume nodes launch duplicate logical nodes
Review Request #1896 - Created Aug. 10, 2011 and submitted
Information | |
---|---|
Eric Sammer | |
old-flume | |
FLUME-706 | |
Reviewers | |
flume | |
eerpini, jon, nerdynick |
This patch should prevent the "double spawn" case where Flume creates a logical node and immediately attempts to reconfigure it. The reason this is bad is because if the node grabs resources (e.g. a port in the case of the thrift RPC source), the subsequent spawn can run into errors that prevent both instances from working properly (due to some bad error handling). What I've done: * Refactored the DirectDriver <--> PumperThread relationship to be unidirectional. PumperThread is now a static inner class rather than just an inner class and state is explicitly exposed to it from the driver. The driver no longer holds certain bits of state and always fetches it directly from the PumperThread. This *should* be a net simplification, but it's still complicated. * LivenessManager used to perform two master RPCs in two different places: once to spawn newly discovered nodes and another to update a node's configuration. Due to some hinkiness, the second operation would always decide to reconfigure the newly spawned node (which was the root cause of the issue). This should no longer be true. All spawn / config updates are now centralized and triggered asychronously (which is not entirely different, but different enough to call out). * LogicalNodeManager no longer attempts to micro manage LogicalNode state (i.e. no longer invokes LogicalNode#loadNodeDriver()). This is encapsulated by the LogicalNode itself. I'm not entirely sure this covers all corner cases. In fact, I'm pretty sure it doesn't. Given the criticality of this code, please be extra critical in review and local testing. This patch was part of an emergency fix for a specific user.
Ran the full flume-core unit test suite. Some failures in the form of "stuck" error states. Looking into it, but little time available.