Review Board 1.6.3

FLUME-41: Added Voldemort sink for Flume

Review Request #960 - updated 3 years, 5 months ago

Dunith Dhanushka Reviewers
flume
https://issues.cloudera.org/browse/FLUME-41
None flume
This is the Voldemort sink for Flume. Dependencies includes voldemort-0.81.jar,log4j-1.2.15.jar and google-collect-1.0.jar. Class responsible for the sink is voldemort.VoldemortSink.
Basic integration has been done with console based source and syslog.

Example configuration for the sink:
voldemortSink("tcp://localhost:6666","test","DAY")
Review request changed
Updated 3 years, 6 months ago (October 7th, 2010, 1:40 a.m.)
  • Basic integration has been done with console based source and syslog.

    Basic integration has been done with console based source and syslog.
    
    Example configuration for the sink:
    voldemortSink("tcp://localhost:6666","test","DAY")
Added sample sink configuration.
Posted 3 years, 6 months ago (October 9th, 2010, 7:45 a.m.)
Dunith,

This is a good first stab.

High level -- there are a few things we need to figure out, and some style/code issues:

1) what are you plans for how this should be distributed
2) there are some nits and some error checking that should be added.
3) I have a question about how the key in your append works.  Can you point a voldemort person to here to review voldemort stuff as well?

Thanks,
Jon.
  1. Jon,
    
    1)I'm planning to keep this as a separate project so that people can download it and plug-in.
    
    2)I've done the mentioned modifications and pushed the changes into repo. Also I'm working on the test cases and hope to complete it soon. 
    
    3)regarding the Voldemort key generation strategy, you can contact Roshan Sumbaly from LinkedIn.
LICENSE (Diff revision 1)
 
 
 
 
What are your plans for how to distribute this?  

If you want to keep the license this way, my suggestion is to keep it as a separate project available for people to download and plug-in.  This is the approach that cassandra and elastic search plugins have taken.  (and the approach I will take with other plugins).  

I want to eventually make it so that plug-ins can be packaged and installed as separate rpm's and deb's -- but this requires a little bit more supporting stuff to make this easy.

To be included with the trunk, (maybe in a plugin dir) a bunch more work would need to happen, and the license would have to be the stadard asf/cloudera one.  We'd also need testcases, and need to setup some build stuff so that we do not include voldemort binaries!
  1. "..need to setup some build stuff so that we do not include voldemort binaries!"
    
    I'm not clear with that. You mean Mavenizing the project?
  2. Since your plan is to keep this as a separate project, I think the question matters less.  
    
    However, if people want to go the trunk way, we will likely require any external binaries to be brought in via maven/ivy. 
README (Diff revision 1)
 
 
this seems funny
README (Diff revision 1)
 
 
maybe $FLUME_HOME?
  1. Corrected as $FLUME_HOME
README (Diff revision 1)
 
 
A user may have their own settings in their flume-site.xml.  

You might instead suggest to copy or augment the flume.plugin.classes and that an example is in the flume-site.xml.template.  
  1. Changes applied
README (Diff revision 1)
 
 
this seems funny
  1. Something went wrong with the patch generation... :)
src/voldemort/Granularity.java (Diff revision 1)
 
 
probably want a apache license copyright here.
  1. Standard ASF/Cloudera license has been added to header.
src/voldemort/VoldemortSink.java (Diff revision 1)
 
 
license
  1. Standard ASF/Cloudera license has been added to header.
src/voldemort/VoldemortSink.java (Diff revision 1)
 
 
sinks should throw exception if two open calls are called one after another without a close call or error between them.
  1. Changes applied.
src/voldemort/VoldemortSink.java (Diff revision 1)
 
 
this should check to see if client is null and throw IllegalStateException if it is (becuase it is not open).
  1. Changes applied
src/voldemort/VoldemortSink.java (Diff revision 1)
 
 
Does the granularity mean that there can only be one event for day/hour/minute?

What is supposed to happen when I set minute granularity and then insert 100 elements in that same minute?  Version might have something to do with it but my intuition makes me think that the version should be integrated with the key instead of (or as well as) the value.

I don't know enough about voldemort semantics to know if this makes sense.  Maybe get point a voldemort person here to review as well?
  1. "Does the granularity mean that there can only be one event for day/hour/minute?"
    
    No, if you set day granularity, then there'll be only one key for that day and all events will be appended to that key(We retrieve the existing value from Voldemort using that key, concatenate the new event and store it back). Eg. if today is 2010-10-10 then the key will be 20101010.
    
    "What is supposed to happen when I set minute granularity and then insert 100 elements in that same minute?"
    Then there will be a unique key per minute. If time is 10:30AM in 2010/10/10 then the key for that minute is 201010101030 and all 100 events will be appended to that key. In other words, if you take the value of key 201010101030, then you can find the 100 events appended together in the value. Events are delimited with pipe '|' character.
    
    Please contact Roshan Sumbaly from LinkedIn for more clarification.
  2. Wow, I think I get what you are doing and it seems unscalable!
    
    Let's say I have 1MM events shoved in the current minute's value,  each of which is 100 bytes.  does that mean I am getting 100MB, appending 100 bytes and then writing 100MB + 100 B on the 1MM+1th element?
src/voldemort/VoldemortSink.java (Diff revision 1)
 
 
this should check to see if the factory is null before calling close (if it is null, it can just return)

you probably want factory=null after close.

also, what about client?
  1. Changes applied. Also client object is set to null after closing the sink.