Ceilometer + Monasca = Ceilosca
Integrating Monasca and Ceilometer seemed like a very good idea from the start. It would integrate all the OpenStack resources notifications and metrics as well as provide a unified storage layer for Monitoring and Metering, simplifying deployment at scale as well as opening the door for new solutions that weren’t possible before.
So the team set out to make it real. The implementation was carried on a three months period and all the code, unit tests and load simulator are open-source and available in the official OpenStack repo at: https://github.com/openstack/monasca-ceilometer
You can also find a replay of the presentation given at the OpenStack Summit in Tokyo: https://www.youtube.com/watch?v=5-IvVwIoCzM
There are at least two aspects that made this “marriage” titillating:
- Ceilometer substantially collects metrics for all the OpenStack resources that Monasca does not currently collect today.
- Ceilometer biggest issue is scalability and performance and that is where Monasca excels.
So, we embarked in this experiment and found a neat way to integrate the two services. The first part was the ingestion. Ceilometer has two main types of agents:
- Notification, a.k.a. Push model agent
- Central/Compute, a.k.a. Pull model agent.
Our integration with Monasca was trying to address all the cases and support both the Push and Pull model. The Compute Agent is probably the part where there is the most overlap with the Monasca Agent that is capable of polling from libvirt or other virtualization layers. We decided to extend the Publisher code in Ceilometer to integrate with Monasca client and send the “measurements” to the Monasca API.
Current Ceilometer architecture:
This brought two distinct advantages to the solution: the first is that we can integrate with any of the Ceilometer agents out of the box, so we can integrate data from all the data sources that Ceilometer supports now and in the future; the second is that we remove the RabbitMQ re-publishing of Samples.
This latter aspect is particularly problematic in large deployments. RabbitMQ clustering has some limitations around the 20M-mark load; this can slow down the queue performance and impact other services relying on the queue. In the Ceilosca case the samples are sent as “measurements” directly to the Monasca API, which stores them into Kafka. This allows for a different publishing rate from the storage rate increasing performance and optimization at the distinct layers.
The Monasca Publisher in the Ceilometer agent also leverages another important aspect in publishing to Monasca, batching. The Monasca Publisher for Ceilometer has three parameters that can be set to control the batching behavior and performance:
- Batch count. This allows specifying how many messages are buffered and sent at in one http request to the Monasca. The Monasca API accepts “measurements” from different “metrics” without having to aggregate them and this is a huge performance boost.
- Batch timeout. This specifies the max time to wait before committing to the batch. Usually this is helpful in the case your message bus is only handling events and not polling, which means it is rare to get a huge amount in a short window of time.
- Batch checking interval. This dictates the frequency when the publisher is checking on the batch size and timeout to understand when is time to make the API POST request. Clearly this has to be carefully set to avoid repeated useless checks but cannot be too large to excessively delay the “measurements” publishing.
In several of our tests we found out the batch of 1000 messages and a timeout of 15s with a polling interval of 5s is a good compromise for a mix of Central Agent loads and OpenStack Notifications.
We all know that deploying and running OpenStack services is not the easiest thing on Earth. For this reason we wanted to move away from sophisticated deployments and make sure the deployment was well understood and one command deployment. We wanted that everybody was able to get Ceilosca to run either on a single VM (or box), so we thought to leverage DevStack…. We know, DevStack is for development and not for scale and performance testing, but guess what; if it scales in DevStack it will scale everywhere else. Hmm, not sure you should keep this statement.
What we needed next was a deployment script; a single unified script to install everything and have it running. Fortunately, both Monasca and DevStack had already deployment scripts that we could run and leverage, the only difference? Monasca uses Ansible and DevStack uses bash … so; we created a new bash script that installs devstack and then runs ansible to deploy Monasca on top of Devstack and that did the trick. Once you download the repo just go and execute:
and (depending on your env) after some time you will get a full DevStack with Ceilosca in it and you are: Ready to GO!
The Devstack+Ceilosca+Monasca is the environment where we run all the tests and we had it running both on virtual machines and baremetals.
Note, we now have a complete DevStack plugin for Monasca.
As we mentioned before the tests were running on DevStack. This is to make sure that the tests are repeatable from anyone that is interested in running them. Clearly DevStack brought some restrictions that we had to deal with it. Moreover, some of us decided to run these tests in OpenStack VM and that made it even more challenging … (hey, we may even try stuff on containers later on, maybe using Kolla…). I will post the results of the these tests in 2 separate blogs relating to Private and Public Cloud.
Ceilosca turned out to be a significant improvement over Ceilometer both during data ingestion as well as querying. The performance gain is quite staggering going from 2x to 4x in ingestion speed and throughput as well as 11x to 18x in querying. These are the main takeaways from the extensive testing we performed:
- Ceilometer has an exponential performance degradation that is directly proportional to the number of tenants and resources.
- Ceilometer has open-ended queries that do not force the requestor to have a query params like tenant_id and time interval. This has been mitigated with the introduction of limits at the Liberty release but still the API could be significantly improved for performance.
- Ceilosca has very efficient batching capabilities across the entire workflow and it is configurable based on cloud deployment specific needs. Ceilosca also can select the metadata to be preserved and the one to be discarded. This is a high value feature.
- Monasca API are nearly twice as fast than Ceilosca implementing Ceilometer V2 API. For users that do not need backward compatibility we recommend to consume the data directly from Monasca.
Cisco: Fabio Giannetti, Ken Owens, Srinivas Sakhamuri, Pauline Yeung, Steven Irvin
HP: Roland Hockmuth, Dan Dyer, Atul Aggarway, Jenny Wei, Putta Challa, Rohit Jaisway