Vikas Patel     Jul,14 2017

Character Encoding Issue with Nodejs

Recently we faced many issues with character encoding in Nodejs. We came across the iconv module and this post explains how we handled those issues. We faced this issue the first time when we created a JSON file in Lucee and tried to parse that file in Nodejs. Nodejs by default uses UTF-8 encoding, but when the source file is not UTF-8 encoded, you will get a crazy character and JOSN parsing will fail. To read a file we were using the fs module which did not help us to use a different character set. I then realized that the Lucee was using the character set 'windows-1252' so maybe I have to read the file with that character set. I was able to do it with node module called iconv-lite.
 

nodejs


Here is the code which I've also posted as an answer on stackoverflow.

The same problem we found when parsing an email with a node module called mailparser. I found that this module is also using the iconv-lite module. But when you check the wiki page of iconv-lite, there are very few but popular character sets are supported. I had received an email with character set called ISO-2022-JP which is a Japanese and the lite version did not help.

I saw the mailparser page and found below issue:

Charset decoding is handled using iconv-lite that is missing some charsets, especially some Japanese ones. If required then it would be possible to switch to native iconv bindings with node-iconv to handle these missing charsets but for now this option is not used for easier packaging.

I tried to use the node-iconv module, but it has not installed in windows. To install module that it has to be built with node-gyp and node-gyp is so complicated for Windows that I dropped an idea to install on windows. I installed the node-iconv module in Linux and it was successfully installed. Now it times to change the mailparser code to use the node-iconv module. Note that the Convert method cannot be used a stream when the content is divided into chunks because characters may get split and you'll encounter an issue.

This is why you'll need a buffer. I've forked my version of mailparser and you can view my changes on github. I had also problems with PDF editing modules for Nodejs which I'll explain in next post. To summarize, node and Windows are not good friends.