Last week I released a new feature for the vanilla_improvements branch of StarCluster: multiple instance type support. It means that our cluster can now select the instance type to bid on depending on a configurable factor and the lowest spot market price for each type.
Want to see how it works? Head to the wiki. Want to know more about how I did it? Read further.
This feature was becoming necessary as the price on the spot market for the instance type we used at Datacratic is far from the great plains it was months ago but closer to the Canadian Rockies. With only one instance type support, it meant that we often had to pay more than we would with different types and in worse case we had no instances at all, stalling our computations.
The situation became heavy as our operations team had to reconfigure the instance type every now and then to adjust to the situation. Great during work hours, not so great outside of it.
As soon as we started talking about multiple instance support, something was clear: we would have to take into account a “selection factor” to weight on the price with each instance type. Eg.: A cr1.8xlarge is not as powerful as a d2.8xlarge, so at the same price or even with d2.8xlarge a bit more expensive, the later has to be picked. Hence, the new “selection factor” in the configuration. When selecting an instance, the following formula applies.
type_a_lowest_price * type_a_selection_factor <= type_b_lowest_price * type_b_selection_factor
Following the same logic, it also makes sense to determine a different bid price for each instance type.
I had to take care of three things. the configuration format read for both new and running cluster, the validation routines and obviously the actual algorithms related to the new functionality, all of that while being backward compatible as much as possible.
This new feature was not so hard to build. Since the vanilla_improvements version already has a zone selection enhancement to bid on the cheapest zone, I already knew the new implementation had to go in that zone. The configuration reading/validation part was also somewhat straightforward. As for backward compatibility, I ensured it by creating three new properties to emulate the old way.
There is one feature that I didn’t know of before implementing the new one which I somewhat dropped. The ability to start a cluster with various node types and quantities on a single NODE_INSTANCE_TYPE variable. While the old syntax no longer works, it is possible to achieve the same by using the new NODE_INSTANCE_ARRAY and defining them in their own zone. Furthermore, that only worked when starting a cluster, so removing it doesn’t look so wrong.
Getting the code
I created a new branch/pull request on purpose to make it easier to see what was done and possibly to fetch the feature for another repository. At this time I have not made a pull request to the official jtriley version as a required one is not merged yet.
I hope this will make your StarCluster experience better. If you find any bugs, you are welcome to report them/make pull requests through github.