bookmark_borderDetecting unknown charsets

Character encoding. That and date time formats are what I consider the two biggest wastes of programmer time when handling data. For the later, sticking to iso-8601 rules out the problem. (Read my Timestamps post for more information.) For the former, sticking to ASCII or UTF8 should work all the time. However, just like for timestamps, you may not control the source and get some “unfriendly” formats. Here are my tips to detect them.

Continue reading “Detecting unknown charsets”