bookmark_borderStarCluster: Multiple node instance type support

Last week I released a new feature for the vanilla_improvements branch of StarCluster: multiple instance type support. It means that our cluster can now select the instance type to bid on depending on a configurable factor and the lowest spot market price for each type.

Want to see how it works? Head to the wiki. Want to know more about how I did it? Read further.

Continue reading “StarCluster: Multiple node instance type support”

bookmark_borderPost Ubuntu upgrade: hip-chat not starting (fix)

I have just updated my Ubuntu installation to the latest distribution (15.04) and found out that hip-chat was no longer starting. I went to the terminal to launch it via the command line and had the following error.
%> hipchat
/usr/bin/hipchat: /opt/HipChat/bin/..//lib/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /usr/lib/x86_64-linux-gnu/libicuuc.so.52)

Simpy reinstalling fixed my issue. So first I removed it.
%> sudo apt-get remove hipchat

To reinstall though, I had to re-enable the external dependency commented out by the Ubuntu upgrade procedure.
%> sudo vim /etc/apt/sources.list.d/atlassian-hipchat.list

Remove the starting “#” to re-enable the line. Then run
%> sudo apt-get update
%> sudo apt-get install hipchat

That fixed the problem for me.

bookmark_borderParsers printing rule: make sure you print what you parsed

There should be a rule for all parsers: Parsers “print” method should always render a string that can be parsed back without changing the semantics. In pseudo code, it translates to:

initial_string = "parse me"

//parse back
assert to_string(parse(initial_string)) == initial_string

//don't change the semantics
assert parse(to_string(parse(initial_string) == parse(initial_string)

If I do the same with JavaScript and JSON:

var jsonStr = '{"key1":"val1","key2":"val2"}';
JSON.stringify(JSON.parse(jsonStr)) == jsonStr;

As programmers, the less we need to think and worry, the better. Parsers following that rule can be used with confidence; they will never betray you. If that statement doesn’t convince you of the importance for consistency, let me give you a couple of examples of errors caused by inconsistent parsers/printers.
Continue reading “Parsers printing rule: make sure you print what you parsed”

bookmark_borderQuery string white space vs plus

Trivia: What is the difference between the encoded query string parameter “a+b” and “a%20b” ?

Answer: Nothing! They are both encoded representations for “a b”.

Isn’t a “+” supposed to remain a “+”? Well, the URL and the query strings are not encoded following the same rules. In the URL, the “+” remains a “+” indeed, but in the query string it’s actually encoded and becomes a “%2B”. This can be misleading.
Continue reading “Query string white space vs plus”

bookmark_borderDetecting unknown charsets

Character encoding. That and date time formats are what I consider the two biggest wastes of programmer time when handling data. For the later, sticking to iso-8601 rules out the problem. (Read my Timestamps post for more information.) For the former, sticking to ASCII or UTF8 should work all the time. However, just like for timestamps, you may not control the source and get some “unfriendly” formats. Here are my tips to detect them.

Continue reading “Detecting unknown charsets”

bookmark_borderPython: warnings and deprecation

Aside the logging library resides the less known warning library. The former is meant to log events related to execution whereas the later is meant to warn more or less about improper module usage or deprecated functions. By default, most warnings are displayed once, meaning that they will not clutter your logs by being shown repeatedly. However, some are “ignored by default”, hence not displayed at all. This is where the important difference with logging is: the control you get over them at the command line level.

Continue reading “Python: warnings and deprecation”

bookmark_borderPython counters on unhashable types

Have you ever heard or used python counters? They are very useful to count the number of occurrences of “simple” items. Basically:

> from collections import Counter
> colors = ['red', 'blue', 'red', 'green']
> Counter(colors)
Counter({'red': 2, 'blue': 1, 'green': 1})

However, if you try to use it on non hashable types it doesn’t work.

> colors = [['red', 'warm'], ['blue', 'cold'], ['red', 'warm']]
> Counter(colors)
[...]
TypeError: unhashable type: 'list'

What do we do then?

Continue reading “Python counters on unhashable types”

bookmark_borderStarCluster: Streaming node addition refactoring and dupe alias fixing

About a month ago I created the streaming node addition functionality within StarCluster. As the time went by, I fixed some of its issues and found it a bit messy and hard to understand so I decided to move it to a separate file. The new version is ready and battle tested.

Another feature that I found to not be working as expected is the handler for nodes having the same alias. I fixed it and made a clean commit easy to pull/cherry-pick. It’s only a matter of calling _recover_duplicate_aliases.