Request-Reply pattern using Apache Kafka or How not to loose your data

Preface

There are many patterns to help with software design. The Gang-of-Fourth provides us with patterns to organize the code better. The other list of patterns called Enterprise Integration patterns (EIP) is intended to help with communications between applications.

One of EIP is Request-Reply.

It is very simple. When one service needs in some data it sends a Request to the other service which is responsible of such data. Then responsible service prepares an Response and provides the Requestor with it.

Request-Reply pattern
  1. Risk scoring of user (wallet)
  2. Risk scoring of country where the payment is going to (according to political verdicts)
  3. Risk scoring of payment type
  1. when the main payment processing service meet a payment having the status “antifraud is to be checked for”, it send Payment ID into Kafka topic with name “PAYMENT-READY-FOR-ANTIFRAUD”
  2. Antifraud service catch this payment ID from Kafka, enrich it with full payment data reading it from database, then sends payment model into Kafka topic named “PAYMENT-DETAILS”
  3. PAYMENT DETAILS Kafka topic is read by RMS in order to:
    a) receives payment
    b) asks user risk score from Know-Your-Customer (KYC) service sending the Request into Kafka topic with name “WALLET-KYC-REQUEST”
    c) Reply for such Request is waited from Kafka topic having name “WALLET-KYC-RESPONSE” by RMS . Here is Request-Reply pattern — each request waits its reply to proceed further calculations.
    d) country score and payment type score are gathered from local database by RMS both
    e) combining all three risk score values altogether RMS is to calculate payment risk score
    f) payment risk score is sent into Kafka topic with name “PAY-DECISION”
  4. Antifraud service listens to payment decisions from Kafka topic, processes them in order to approve or decline the payment

The Situation

The described above solution was successfully tested onto two of our independent QA stands. But it started to fail right after it was deployed onto production infrastructure (we use Kubernetes).

The Analysis

Such the problem is usual in microservice architecture. Developers must keep in mind that their applications might be started in more that one instances in parallel. And when the data is needed to be processed inside one instance of application (in one transaction or without any) from the beginning till the end, it really important to be convicted that the application does not consume the data not intending to itself. Being more precise, the application should consume the data intending only to itself.

Kafka topics

All Kafka topics are divided into partitions.

Kafka clients

Kafka clients are of types: producer and consumer. First one puts the data into Kafka topic and the latter reads the data listening (polling) it from Kafka topic accordingly.

Kafka producers

When the Kafka producer sends message into Kafka topic it is put only in one partition of the topic. The magic is how the producer chooses the partition to put message in. Every message might have the Key (it might be null at all). All Producers have their Hash-Function which helps to select a concrete partition to put the message to. So that Hash-Function takes the message Key as an argument to calculate the Result. Then this Result is the Partition number where the message will be put to.

partitionNumber = messageKey % partitionsCount

Kafka consumers

In other side the Consumer does not read the data along. It have to work in Consumer Group (remember group-id in connection properties?). Kafka guarantee every message is delivered every consumer group. Kafka guarantee that every message is delivered to only one consumer in each consumer group.

The Problem

RMS service had one producer to write into WALLET-KYC-RESPONSE and one consumer to read from WALLET-KYC-RESPONSE. KYC service was not scaled at that moment — it listened all the partitions and wrote in one of partitions of topic WALLET-KYC-RESPONSE. All RMS consumers (in all instances of RMS) worked into one consumer group. Please take a look at the scheme below.

Miscommunication into the World of Microservices

The Solution

The good solution would be to adjust all producers to put messages right into the topics that was listened to by consumers which was in need of the very data. It became a mess! The other side — such scenario is very expansive to maintain. They must to keep in mind the whole scheme of interactions between four microservices and to tune them all if only one of them needs scaling. If they do not, the data would have might be lost.

The better solution

Our solution takes for about two hours to be implemented. And the maintaining will be inexpensive.

String groupId = "rms-" + UUID.random().toString();

Conclusions

  1. The design of asynchronous interaction should take into account not only aspects of the interaction itself, but also the stages of architecture expansion.
  2. Synchronous interaction would be cheaper to support the solution we found.
  3. When implementing Request-Reply on Kafka, it is important that the retention periods of messages in the topic in the Kafka broker are set to values greater than the time of a possible restart of applications — this will reduce the risk of data loss during restarts.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vyacheslav Boyko

Vyacheslav Boyko

Software developer, writing in Java. Over 15 years in IT. Enthusiast in EDA, Microservices, Integrations, experiencing in finance, trading, banking. He/Him.