NingG +

Kafka 0.8.1:Consumer API and Consumer Configs

Consumer API

如何从Kafka中读取数据?三种方式:

High Level Consumer API

class Consumer {
  /**
   *  Create a ConsumerConnector
   *
   *  @param config  at the minimum, need to specify the groupid of the consumer and the zookeeper
   *                 connection string zookeeper.connect.
   */
  public static kafka.javaapi.consumer.ConsumerConnector createJavaConsumerConnector(ConsumerConfig config);
}

/**
 *  V: type of the message
 *  K: type of the optional key assciated with the message
 */
public interface kafka.javaapi.consumer.ConsumerConnector {
  /**
   *  Create a list of message streams of type T for each topic.
   *
   *  @param topicCountMap  a map of (topic, #streams) pair
   *  @param decoder a decoder that converts from Message to T
   *  @return a map of (topic, list of  KafkaStream) pairs.
   *          The number of items in the list is #streams. Each stream supports
   *          an iterator over message/metadata pairs.
   */
  public <K,V> Map<String, List<KafkaStream<K,V>>>
	createMessageStreams(Map<String, Integer> topicCountMap, Decoder<K> keyDecoder, Decoder<V> valueDecoder);

  /**
   *  Create a list of message streams of type T for each topic, using the default decoder.
   */
  public Map<String, List<KafkaStream<byte[], byte[]>>> createMessageStreams(Map<String, Integer> topicCountMap);

  /**
   *  Create a list of message streams for topics matching a wildcard.
   *
   *  @param topicFilter a TopicFilter that specifies which topics to
   *                    subscribe to (encapsulates a whitelist or a blacklist).
   *  @param numStreams the number of message streams to return.
   *  @param keyDecoder a decoder that decodes the message key
   *  @param valueDecoder a decoder that decodes the message itself
   *  @return a list of KafkaStream. Each stream supports an
   *          iterator over its MessageAndMetadata elements.
   */
  public <K,V> List<KafkaStream<K,V>>
	createMessageStreamsByFilter(TopicFilter topicFilter, int numStreams, Decoder<K> keyDecoder, Decoder<V> valueDecoder);

  /**
   *  Create a list of message streams for topics matching a wildcard, using the default decoder.
   */
  public List<KafkaStream<byte[], byte[]>> createMessageStreamsByFilter(TopicFilter topicFilter, int numStreams);

  /**
   *  Create a list of message streams for topics matching a wildcard, using the default decoder, with one stream.
   */
  public List<KafkaStream<byte[], byte[]>> createMessageStreamsByFilter(TopicFilter topicFilter);

  /**
   *  Commit the offsets of all topic/partitions connected by this connector.
   */
  public void commitOffsets();

  /**
   *  Shut down the connector
   **/
  public void shutdown();
}

You can follow this example to learn how to use the high level consumer api.

Simple Consumer API

class kafka.javaapi.consumer.SimpleConsumer {
/**
*  Fetch a set of messages from a topic.
*
*  @param request specifies the topic name, topic partition, starting byte offset, maximum bytes to be fetched.
*  @return a set of fetched messages
*/
public FetchResponse fetch(kafka.javaapi.FetchRequest request);

/**
*  Fetch metadata for a sequence of topics.
*
*  @param request specifies the versionId, clientId, sequence of topics.
*  @return metadata for each topic in the request.
*/
public kafka.javaapi.TopicMetadataResponse send(kafka.javaapi.TopicMetadataRequest request);

/**
*  Get a list of valid offsets (up to maxSize) before the given time.
*
*  @param request a [[kafka.javaapi.OffsetRequest]] object.
*  @return a [[kafka.javaapi.OffsetResponse]] object.
*/
public kafak.javaapi.OffsetResponse getOffsetsBefore(OffsetRequest request);

/**
* Close the SimpleConsumer.
*/
public void close();
}

For most applications, the high level consumer Api is good enough. Some applications want features not exposed to the high level consumer yet (e.g., set initial offset when restarting the consumer). They can instead use our low level SimpleConsumer Api. The logic will be a bit more complicated and you can follow the example in here.

Kafka Hadoop Consumer API

Providing a horizontally scalable solution for aggregating and loading data into Hadoop was one of our basic use cases. To support this use case, we provide a Hadoop-based consumer which spawns off many map tasks to pull data from the Kafka cluster in parallel. This provides extremely fast pull-based Hadoop data load capabilities (we were able to fully saturate the network with only a handful of Kafka servers).

Usage information on the hadoop consumer can be found here.

Consumer Configs

The essential consumer configurations are the following:

下文将详细介绍这些参数:

notes(ningg):consumer group?复习一下,为什么有这个?本质:Kafka中一条message,发送到哪些地方呢?一种是群发给Consumer,一种是只发送给某一个满足条件的Consumer;同时message要求在同一个Consumer中保证message的处理顺序,在满足这一功能需求的情况下,同时为了改善性能,增加了一个概念:consumer group,同一个group下可以包含多个consumer,每次group接收到message,就实例化其内部的一个consumer,如果一个partition中的message就发送给一个group,则顺序处理;否则就是并发处理。疑问:一个consumer group中只包含一个consumer就能够实现串行顺序处理了,为什么还要放置多个consumer?

notes(ningg):在设置zookeeper.connect时,可以设置zookeeper的chrootchroot的含义:改变元数据在global Zookeeper namespace中的存储位置;一旦修改了chroot,就需要在链接Zookeeper时,也用上chroot,具体形式:hostname1:port1,hostname2:port2,hostname3:port3/chroot/path。(当前理解,前面的/chroot/pathhostname1:port1也是有效的)

notes(ningg):难道不是consumer每成功fetch一个message,就commit一次offset?

notes(ningg):message chunk什么意思?有用吗?

notes(ningg):参数zookeeper.session.timeout.ms与参数auto.commit.interval.ms之间的关系,前者衡量的是heartbeat,而后者负责的是offset commit。

More details about consumer configuration can be found in the scala class kafka.consumer.ConsumerConfig.

参考来源

Top