https://xkcd.com/1722/

https://xkcd.com/1722/

Recently at work I was tasked to fix our Cruise Control installation. We had identified a bug where Cruise Control was not getting the metrics it needed to rebalance a kafka cluster. Another teammate had already done the discovery on this and had determined that we needed to setup Prometheus to get metrics out of our Kafka clusters (AWS MSK clusters) and into Cruise Control. He sent me these two links as references:

Open monitoring with Prometheus

Using LinkedIn's Cruise Control for Apache Kafka with Amazon MSK - Amazon Managed Streaming for Apache Kafka

This seemed pretty straightforward so I dove in!

Gotcha #1

My first step was top build docker images for the both Prometheus and Cruise Control so I could test our hypothesis locally. I had a Prometheus container running and collecting metrics from a Kafka cluster with very few unexpected hurdles.

Next, Cruise Control.

The container for this was also fairly simple and straightforward. The first gotcha came when I was trying to get Cruise Control to connect to Prometheus. The AWS guide indicates to put something like this in the cruisecontrol.properties file:

# Prometheus Metric Sampler specific configuration
prometheus.server.endpoint=1.2.3.4:9090 # Replace with your Prometheus IP and port

I had put both the Cruise Control and Prometheus containers on the same docker network, so my Prometheus URL was going to be http://172.17.0.2:9090.

I set my cruisecontrol.properties file to use an environment variable, so I can pass the url at docker run-time, rather than baking it into the docker image.

# Prometheus Metric Sampler specific configuration
prometheus.server.endpoint=${env:PROMETHUES_SERVER}

But when starting the Cruise Control service it would throw this error:

ERROR Uncaught exception on thread Thread[main,5,main] \\
(com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain)
com.linkedin.cruisecontrol.common.config.ConfigException:
Prometheus endpoint URI is malformed, 
expected schema://host:port, provided <http://172.17.0.2:9090>

This was extremely confusing because the url I had was matching the expected input pattern: schema://host:port. I tried removing the protocol, using a hostname instead of an IP and other random things hoping it would work. None of these did. So I turned to the internet to figure out what was going on and eventually found this github issue that seemed related.

I ended up reading through more source code of the Cruise Control project that I care to admit but eventually it led me to believe that there was a trailing space in the line where I was setting my prometheus.server.endpoint value. When I double checked, sure enough, there was a trailing space. And when I removed it Cruise Control was able to start up without a problem. Reminded me I should add a whitespace visualizer extension to VSCode.

Mounting soap box

As noted in the github issue, I really think if any program is going to do some input validation, it should be stripping leading and/or trailing spaces prior to performing that validation. And if not, the logs would be formatted in a way such that it would be obvious if a space was the issue. In my case, the logs don’t add any punctuation after listing my provided value so a trailing space is impossible to notice.

My personal approach to solving this is to add some brackets around inputs when logging so it is 100% clear what values is being ingested by the program. For example, in Python I would write it like this using f-strings: