Multilanguage support when dealing with String comparisons

We’ve just come across something quite unexpected.

Unless the language support is specifically set to Turkish, the Turkish character ‘ı’ displays a non-obvious behaviour in computing. It’s capitalisation is rendered as ‘I’, which happens to be the same than the capitalisation for the Latin ‘i’ character when the language is set to a Latin alphabet variation. If the language is set to Turkish, the capitalisation if the Latin character ‘i’ is ‘İ’, which doesn’t exist in the Latin character set… as opposed to ‘I’.

This obscure behaviour has an interesting consequence when dealing with string comparisons. For example, comparing ‘Yarın’ and ‘Yarin’ in lower case, will yell different results… but if we compare upper(‘Yarın’) with upper(‘Yarin’), they will be interpreted as the same string if the underlying language is NOT set to Turkish…

There… something to think during the bank holiday.

And if more info is needed: http://en.wikipedia.org/wiki/Dotted_and_dotless_I#In_computing