Perfect-Kafka

This project provides an express Swift wrapper of librdkafka.

This package builds with Swift Package Manager and is part of the Perfect project but can also be used as an independent module.

Release Notes for MacOS X

Before importing this library, please install librdkafka first:

$ brew install librdkafka

Please also note that a proper pkg-config path setting is required:

$ export PKG_CONFIG_PATH="/usr/local/lib/pkgconfig"

Release Notes for Linux

Before importing this library, please install librdkafka-dev first:

$ sudo apt-get install librdkafka-dev

Quick Start

Kafka Client Configurations

Before starting any stream operations, it is necessary to apply settings to clients, i.e., producers or consumers.

Perfect Kafka provides two different categories of configuration, i.e. Kafka.Config() for global configurations and Kafka.TopicConfig() for topic configurations.

Initialization of Global Configurations

To create a configuration set with default value settings, simple call:

let conf = try Kafka.Config()

or, if another configuration based on an existing one can be also duplicated in such a form:

let conf = try Kafka.Config()
// this will keep the original settings and duplicate a new one
let conf2 = try Kafka.Config(conf)

Initialization of Topic Configurations

Topic configuration shares the same initialization fashion with global configuration.

To create a topic configuration with default settings, call:

let conf = try Kafka.TopicConfig()

or, if another configuration based on an existing one can be also duplicated in such a form:

let conf = try Kafka.TopicConfig()
// this will keep the original settings and duplicate a new one
let conf2 = try Kafka.TopicConfig(conf)

Access Settings of Configuration

Both Kafka.Config and Kafka.TopicConfig have the same api of accessing settings.

List All Variables with Value

Kafka.Config.properties and Kafka.TopicConfig.properties provides dictionary type settings:

// this will print out all variables in a configuration
print(conf.properties)
// for example, it will print out something like:
// ["topic.metadata.refresh.fast.interval.ms": "250",
// "receive.message.max.bytes": "100000000", ...]

Get a Variable Value

Call get() to retrieve the value from a specific variable:

let maxBytes = try conf.get("receive.message.max.bytes")
// maxBytes would be "100000000" by default

Set a Variable with New Value

Call set() to save settings for a specific variable:

// this will restrict message receiving buffer to 1MB
try conf.set("receive.message.max.bytes", "1048576")

Producer

Perfect-Kafka provides a Producer class to send data / message to Kafka hosts. Producer can send a message one at a time, or sent multiple messages in a batch. Messages can be either text string or binary bytes.

let producer = try Producer("VideoTest")
let brokers = producer.connect(brokers: "host:9092")
if brokers > 0 {
  let _ = try producer.send(message: "hello, world!")
}

Before sending any actual messages, a few steps are required to setup the connection to Kafka hosts.

Producer Instance with a Topic

To initialize a Producer instance, a topic name is required no matter whether this topic exists in the Kafka hosts or not.

If the topic didn't exist when connected to Kafka hosts / brokers, Producer() would try to create a new one; Otherwise it would use the existing topic for further operations.

For example, the demo below shows how to start a producer with a topic named "VideoTest":

let producer = try Producer("VideoTest")

Connect to Brokers

Use method connect() to connect to one or more message brokers, i.e., Kafka hosts ( host and port ):

let brokers = producer.connect(brokers: "host1:9092,host2:9092,host3:9092")

If success, it will return the number of hosts that connected.

Alternatively, it is also possible to connect to brokers by different parameter fashions, take example, hosts can be an array of string:

let brokers = producer.connect(brokers: ["host1:9092", "host2:9092", "host3:9092"])

or dictionary: swift let brokers = producer.connect(brokers: ["host1": 9092, "host2": 9092, "host3": 9092])

Send Messages

Perfect Kafka allows to send either text or binary messages to brokers one at a time or in a batch.

Method Description Returns
send(message: String, key: String? = nil) a text message with an optional key to send an Int64 message id
send(message: [Int8], key: [Int8] = []) a binary message with an optional key to send an Int64 message id
send(messages: [(String, String?)]) text messages with optional keys in an array [Int64] message IDs for each message
send(messages: [([Int8], [Int8])]) binary messages with optional keys in an array [Int64] message IDs for each message

Sent or Not

Perfect Kafka send() is asynchronous function so the library provides a few extra methods to determine the sending status of each message.

  • OnSent() callback. If set properly, each message will call this event once actually sent. For example: producer.OnSent = { print("msg #\($0) was sent") }. The only parameter of this event is the Int64 message id returned by send().

  • producer.outbox is an [Int64] array to indicate the messages in sending queue. NOTE As a high performant streaming platform, the existence of messages in outbox doesn't mean that they were failed to send, so don't try to resend these message unless it was explicitly confirmed that they were failed to send.

  • OnError() callback. Producer will call this event if something wrong, e.g., producer.OnError = { print("error: \($0)") } will print out the error message if happen.

  • flush(_ seconds: Int) method can help wait seconds for clearing the message queue and flushing the outbox.

Consumer

Before actually receiving messages from Kafka with a specific topic, a few procedures are required to initialize a Consumer instance:

let consumer = try Consumer("VideoTest")
let brokers = consumer.connect(brokers: ["host1": 9092, "host2": 9092, "host3": 9092])
guard brokers > 0 else {
  // connection failed
}//end guard

Partitions

Once connected, it is a good idea to get the information from the brokers to see if there are sufficient resources, i.e., partitions, for further operations:

let info = try consumer.brokerInfo()
print(info)

The above variable info is a MetaData structure as reference below:

Member Type Description
brokers [Broker] An array of Broker structure
topics [Topic] An array of Topic structure

Structure Broker stores the information of a broker:

Member Type Description
id Int Broker Id
host String Host name of the broker
port Int Host port that listens

The major content of Topic structure is to record how many partitions are using in such a topic:

Member Type Description
name String Topic name
err Exception Topic error reported by broker
partitions [Partition] Partitions of this topic

Data structure Partition is vitally important to indicate the partition id for messaging:

Member Type Description
id Int Partition Id - use this to start / stop messaging
err Exception Partition error reported by broker
leader Int Leader broker
replicas [Int] Replica brokers
isrs [Int] In-Sync-Replica brokers

Practically, partition info could be acquired by way below:

let consumer = try Consumer("VideoTest")
let brokers = consumer.connect(brokers: ["host1": 9092, "host2": 9092, "host3": 9092])
guard brokers > 0 else {
  // connection failed
}//end guard
consumer.OnArrival = { m in print("message : #\(m.offset) \(m.text)")}
let info = try consumer.brokerInfo()
guard info.topics.count > 0 else {
  // no topic found
}//end guard
guard info.topics[0].name == "VideoTest" else {
  // it is not the topic we want
}//end guar
let partitions = info.topics[0].partitions

Download Messages From A Partition

Code below shows how to download messages from a partition. In this demo, we assume let partId = partitions[0].id:

consumer.OnArrival = { m in
  print("message #\(m.offset) : \(m.text)")
}//end event

// start downloading
try consumer.start(partition: partId)
// run until end of program
while(notEndOfProgram) {
  let total = try consumer.poll(partition: partId)
  print("\(total) messages arrived in this moment")
}//end while
consumer.stop(partId)

Now we take a walk through:

Firstly, OnArrival() event is a callback with a Message data structure:

Member Type Description
err Exception Error: if the message is good or not
topic String topic name of the message
partition Int partition of the message
isText Bool if the message is a valid UTF-8 text or not
data [Int8] the original binary data of message body
text String decoded message body in a UTF-8 string, if isText
keyIsText Bool if the key is a valid UTF-8 text
keybuf [Int8] the original binary data of optional key
key String decoded key in a UTF-8 string, if keyIsText
offset Int64 offset inside the topic

Secondly, call start() to start download messages: func start(_ from: Position = .BEGIN, partition: Int32 = RD_KAFKA_PARTITION_UA), here are the parameter details: - from: Position, from which position of the messages in the partition to download. Valid value can be .BEGIN to indicate downloading every messages from the very beginning, or .END to download the most reason one, or .STORED to download the previous stored messages in case of failure, or .SPECIFY(Int64) to start downloading from a specific location. NOTE use func store(_ offset: Int64, partition: Int32 = RD_KAFKA_PARTITION_UA) to store a specific message if .STORED is needed. - partition: Int32, the partition id.

Then Perfect Kafka provides the poll() function to wait a short while to listen the activity of a specific partition: func poll(_ timeout: UInt = 10, partition: Int32 = RD_KAFKA_PARTITION_UA). The timeout is the milliseconds to wait for polling.

Finally, call stop() to end the messaging.