04.15.07

Chinese Email/Mac OS X/Mail.app/

Posted in Uncategorized at 2:12 am by ryan

乱码的

If you’re a Mac users that deals with Chinese, you’ll probably frequently run into:

1) People sending you emails that are turned into gibberish
2) You sending emails to others that are turned into gibberish

As soon as cracked copies of Vista start being sold in Shanghai (maybe already, just haven’t looked), Chinese folks will start upgrading and will thereby start using Unicode. But, in the mean time we still live in a world of “code pages“… Basically, the binary representing for the letter “A” could be different depending on the country you live in. So when you transmit files from one country to another (our one language to another), you frequently run into encoding issues.

Most web browsers and email programs have a menu buried somewhere to allow the user to change the current encoding, though most users don’t know how to do so… There’s also a utility called “iconv” that can help you switch between most encodings, but it’s move of a programmer thing…

So, bottom line, if you’re on Mac OS X and want to work with Mainland Chinese folks, you should open up your Terminal.app (under Applications/Utilities) and just paste in the following lines:

defaults write com.apple.mail NSPreferredMailCharset “UTF-8″

defaults write com.apple.mail NSPreferredMailCharset “GBK”

That’s it. Done and done. And yes, even the ocossional business traveler to China should do this because your Chinese contacts should send you their address - of course that address is in Chinese - so if you expect the cab driver or anyone else to help you find that address, you’ve better print it up right. This will help.

(Yes, I realize the extra setting to UTF-8 and then switching to GBK seems superfluous, but I believe it’s a work around for a bug… More info here…)

There is an article about this on the Apple site, and discussion board, but their suggested encoding is wrong. It should be GBK, not GB.

Technorati Tags: , , , , ,

04.09.07

Valid UTF-8 data (hex:) followed by invalid UTF-8 sequence

Posted in Uncategorized at 5:26 am by ryan

OK, this one is a bit geeked out again, but it’s relevant to China. If you’re an american, you could probably go your entire life without ever bumping into codepages, but if you’re life crosses paths with asia, you almost certainly will…

As we’re developing a new website,doing our subversion (version control system) check-in, I started bumping into a very unusual error.

ryan@116843:/spike/public/news/app/webroot/redv1.0/img/menu$ sudo svn up
svn: Valid UTF-8 data
(hex:)
followed by invalid UTF-8 sequence
(hex: b8 b4 bc fe)

Unfortunately, google didn’t come up with much. The best hit was a Oct 10th post on the subversion users mailing list. Basically, the answer is that there’s no answer.

Well, I did an svn up in each child directory of the one causing the problem and eventually tracked the error down through my project’s directory tree. It looks like one of the guys using a windows system copied a JPEG with a Chinese GBK encoded filename onto the server. Everything is best kept in UTF-8.

Once finding the right file, you have to figure out how to delete a file with a name that can’t be typed…

ryan@116843:/spike/public/news/app/webroot/redv1.0/img/menu$ ls
logo02.jpg       ???? logo.jpg  menu_acc_down.jpg      menu_home_down.jpg  menu_work_down.jpg
logo03.jpg       logo.jpg       menu_acc.jpg           menu_home.jpg       menu_work.jpg
logo04.jpg       logo_top1.jpg  menu_cameras_down.jpg  menu_len_down.jpg
logo05.jpg       logo_top2.jpg  menu_cameras.jpg       menu_len.jpg
logo06.jpg       logo_top3.jpg  menu_gall_down.jpg     menu_tech_down.jpg
logo_bottom.jpg  logo_top.jpg   menu_gall.jpg          menu_tech.jpg

In this case, I just used: rm *\ logo.jpg since there was only one file matching this pattern… Next, I could commit again!

ryan@116843:/spike$ sudo svn up
D public/.htaccess
Updated to revision 38.