Visual Basic 6, ActiveX and Unicode

One of the older applications I support uses ActiveX controls embedded inside a web page.  These controls request data from a web server to update the information on the page without requesting the whole page again, much in the same way that AJAX is now commonly used.

This has worked fine for the Latin code pages (ISO8859-1, ISO8859-15), and for the double byte code page (cp950)  that have been tested.  However it did not work when I tried the UTF-8 Unicode code page.

The reason for this is fairly simple:

VB stores strings internally using Unicode, but assumes that the outside world is ANSI.

This means that Visual Basic will convert from ANSI to Unicode (UTF-16) when storing a string, and convert it back again when it is retrieved.

The ActiveX controls use the Microsoft Inet control to request data via HTTP.  This uses the GetChunck() method in the StateChanged event in order to read the data in to a string. This was the first cause of my problems as Visual Basic will automatically convert the data in the string to ANSI, which loses the Unicode characters.

The Inet control GetChunck() method takes two parameters; size and data type. The size parameter tells it how much data to read, and the data type parameter tells it what data type to read it in to. The data was being read in to a string (icString), but to avoid the conversion I had to change this to a byte array (icByteArray) to avoid the automatic conversion process.

So far so good. But now I had a UTF-8 byte array that I needed to convert in to a string without losing data in the conversion process. This was a bit of a sticking point as Visual Basics string conversion function StrConv() can’t cope with UTF-8 and none of the API calls I found to convert the string worked. You can assign a string equal to a byte array and no automatic conversion happens, but as strings are stored internally as UTF-16 this does not work.

I was nearly at the stage where I either needed to write my own conversion process, or re-develop the controls in another language with better UTF-8 support.

Then I found this solution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Public Function ConvertUtf8BytesToString(ByRef data() As Byte) As String
    Dim objStream As ADODB.Stream
    Dim strTmp As String

    ' init stream
    Set objStream = New ADODB.Stream
    objStream.Charset = "utf-8"
    objStream.Mode = adModeReadWrite
    objStream.Type = adTypeBinary
    objStream.Open

    ' write bytes into stream
    objStream.Write data
    objStream.Flush

    ' rewind stream and read text
    objStream.Position = 0
    objStream.Type = adTypeText
    strTmp = objStream.ReadText

    ' close up and return
    objStream.Close
    ConvertUtf8BytesToString = strTmp
End Function

This does not use any APIs but requires the Microsoft ActiveX Data Objects 2.5 Library or later.

Using this solution I was able to assign the original internal string variable to the result of this function and the rest of the code in the controls worked.

1
strWSConnectReturnData = ConvertUtf8BytesToString(bytWSConnectReturnData)

The ActiveX controls also read data values from the webpage and POST them back to the webserver.  The values are read via the DOM. These also need to be converted in the opposite direction, before they can be URL encoded.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Public Function ConvertStringToUtf8Bytes(ByRef strText As String) As Byte()
    Dim objStream As ADODB.Stream
    Dim data() As Byte

    ' init stream
    Set objStream = New ADODB.Stream
    objStream.Charset = "utf-8"
    objStream.Mode = adModeReadWrite
    objStream.Type = adTypeText
    objStream.Open

    ' write bytes into stream
    objStream.WriteText strText
    objStream.Flush

    ' rewind stream and read text
    objStream.Position = 0
    objStream.Type = adTypeBinary
    objStream.Read 3 ' skip first 3 bytes as this is the utf-8 marker
    data = objStream.Read()

    ' close up and return
    objStream.Close
    ConvertStringToUtf8Bytes = data
End Function

This returns a byte array, and I pass it directly in to a function that URL encodes the byte array, returning a sting.

1
String = URLEncodeUTF8ByteArray( ConvertStringToUtf8Bytes( DomValue) )

Many thanks to Tim Hastings for his solution, as this has saved me a lot of pain!

6 Responses to “Visual Basic 6, ActiveX and Unicode”

  1. Interesting to see how you got around this one.

    I still think you need to make a case to loose this legacy code though 🙂

  2. If you want to make it easy to support UniCode in Visual Basic then take a look at the UniToolbox control suite which replaces all the common VB controls with UniCode aware versions:

    http://www.iconico.com/UniToolbox

  3. That is just fantastic. Had been fighting this damn issue all day, and was also about to give up on it, when i found this.
    I was trying to read an ascii file and then write it out to another file (RTF format) and it kept adding 2 bytes in the start of the document, so Word or Wordpad did not like it anymore.

    So from reading you recipe i got the idea to just advance the position 2 bytes like this, and it works wonders.

    objStream.Position = 2
    GetFile = objStream.ReadText

  4. Here, full unicode support in both design mode and during runtime for VB6.
    Complete source code.

    OptionButton

    Checkbox

    Label

    CommandButton

    File I/O

    Clipboard I/O

    Other routines for putting Unicode into the caption of any VB control with a hWnd,
    including forms.

    Just download from here and you’re all set:

    http://motionlabresources.org/Unicode%20&%20RTF%20for%20VB6.zip

  5. Elroy, thanks for that. Been a long time since I’ve been actively doing any VB6 development so I haven’t verified it. Hopefully it will help someone else out.

  6. Hey Paul, it’s just source code, so whoever gets it can check it out themselves. Occasionally I’ve needed some special characters and always done a hack but I finally decided to just bite the bullet and develop some nice Unicode controls for VB6. Interestingly, the problem has always been the PropertyBag (and the Properties Window). Captions and the RichTextBox (as well as VB6 strings) have always done unicode. I just bit the bullet and figured out how to get Unicode in the PropertyBag (it was actually quite easy, just a byte array in a variant, and it gets stored in the .FRX file) and used the RichTextBox for allowing editing of the caption property during design time. I still maintain many many thousands of lines of code written in VB6, and I’m just getting more and more in a mindset of sharing these days. You take care.

Leave a Reply