Enterprise Integration Zone is brought to you in partnership with:

I am a Software Engineering graduate (MEng) from the University of Southampton currently working for VisualDNA in Shoreditch, UK. My job mostly involves Java programming focussed on highly concurrent and scalable systems for big data. I also use Scala, bash, Hive, Pig and other tools to manipulate and manage our events data. I am interested in distributed systems, security (both physical and digital) and big data technologies. Dexter has posted 1 posts at DZone. You can read more from them at their website. View Full User Profile

Book Review: Apache Kafka

01.07.2014
| 2468 views |
  • submit to reddit
Published by: Packt Publishing
ISBN: 9781782167938

Reviewer Ratings

Relevance:
4

Readability:
4

Overall:
3

Buy it now

One Minute Bottom Line

Buy this book if you are new to or interested in Kafka, find the documentation a bit daunting or need to get up to speed on Kafka quickly.

Do not buy this book if you are just interested in the chapter on integration with Storm and Hadoop or you already know your way around Kafka.

Review

When I received this book the first thing that struck me was its length. It is a very short book, weighing in at a mere sixty nine pages. Despite its brevity, it covers quite a wide range of topics. Some of these are very useful for newcomers, such as how to actually install Kafka and its design fundamentals. This well grounded approach to learning about Kafka continues throughout most of the book, making it excellent for someone looking to know more about Kafka and maybe wanting to play around with it.

This book is definitely focused more towards those beginning Kafka. It begins by discussing why Kafka is needed and some of the problems it solves. This is a good way to start the book as it focuses the reader on what Kafka is aimed at, allowing them to quickly determine if it fits their needs. Keeping with the beginner-centric approach it even explains how to install Kafka in several different modes, which will definitely help anyone looking to experiment in getting a working environment up and working very quickly. Throughout the book when it gives examples, which it does frequently, it mentions which type of cluster to set up as well as things like replication factor and partitions for the topics that are created as part of the examples. This hands-on approach means that by the end of the book, if the examples are followed faithfully, the reader will have a good idea how to set up and manage the various parts of a Kafka cluster in most of the common configurations I have come across. Some more advanced options are covered too, such as setting up several brokers on one machine and sharing one zookeeper instance.

Despite the clear focus on beginners, I believe this book is still useful even to those with some Kafka experience. At the time of writing I have been using Kafka 7 for around 6 months. This book has a section that concisely covers the differences between Kafka 7 and 8, how to set up both clusters and some of the other tools such as mirroring as well as briefly mentioning how to migrate from 7 to 8. The book as a whole is mostly focused on version 8 but is still useful for those wishing to experiment with 7. I think this is probably the best approach that could have been taken, as due to publishing time scales and how close version 8 is to a final release, version 8 is probably what most newcomers should use and the brief comparison will help existing users decide if they need to switch. The mention of migration was unfortunately missing concrete examples, but being as this is a beginner-focused book and only existing users would need this information, I can see why these were excluded. On the other hand, I think this attempt to keep the scope narrow backfires near the end of the book as it begins to look at integration with Storm and Hadoop. It vaguely mentions what these technologies are, tells the reader the vague approach and the classes to use. However, given the potential utility of these sections, they were sorely missing the easy-to-follow examples prevalent throughout the rest of the book, and I feel this puts them outside of the book's scope.

My favourite thing about this book is that it clearly defines its audience early on and sticks to it (right up to its last few pages). It is an excellent book for beginners, full of simple examples, diagrams and screenshots showing expected output. There is also a clear progression; it starts with setup and command line tools and progresses to more advanced custom programs. All in all it is a good introductory book, which covers all the topics often missed out of introductory texts, re-enforces past points and keeps the reader well grounded by explaining the reasoning behind Kafka's existence and design.

What I disliked most about this book was the final few pages. The blurb leads the reader to expect to come out knowing how to integrate Kafka with Storm and Hadoop, but the book only vaguely covers the approach the reader should take and some classes to read up on, making the last few pages feel very rushed. I have worked on a job that mirrored Kafka data to Hadoop and while I recognised the class names and approach were correct, it is a reasonably complicated task. Given what seems to be the book's target audience, I believe these pages do not really fit with the rest of the contents.

This book could definitely be improved by expanding the final chapter to include more beginner-friendly content. Also, at the time of writing, the final example in the section on writing consumers is very poor. It was clearly meant to be used to show how to build a multi-threaded consumer but once instantiated the executor service remains unused and the partitions are all consumed from the control thread. This means that the example appears to work but does not actually show multi-threaded consumption. This example can be fixed relatively easily, but given that multiple partitions consumed in parallel is probably one of the more common use cases it is disappointing that this error made it this far. The example itself seems not to have been properly thought out in terms of multithreading, as even if it is made to use many threads, the executor is immediately shut down once initialised so this needs refactoring. I have reported this error to the publishers and they are currently reviewing my suggested corrections, but it is worth mentioning that this example is incorrect to prevent frustration as readers try to apply what they have learned in real use cases.

You can buy this book from Packt Publishing
Published at DZone with permission of its author, Dexter Lowe.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)