Tuesday, July 12, 2011

Converting VARBINARY to VARCHAR yields only opening character

Came across this interesting feature of SQL Server the other week. The system we're working on takes incoming text files and stores their contents in full as VARBINARY fields in order to maintain a complete collection of messages that have been uploaded to the system. Currently we're circumventing this default load process and loading directly into the VARBINARY columns from our external database.

DECLARE @foo NVARCHAR(3)
SET @foo = 'bar'
SELECT CONVERT(VARBINARY,@foo)

This yields the expected Unicode binary: 0x620061007200

However when unpacking these VARBINARY fields to VARCHARs we get the following:

SELECT CONVERT(VARCHAR,CONVERT(varbinary,@foo))

Yielding: 'b'

Did you see the error?

The problem arises when unknowingly boxing an NVARCHAR into a VARCHAR because you're unaware of what the initial datatype was before it got converted to binary. The variable @foo was originally an NVARCHAR that got converted to binary thus retaining the additional UTF-8 bytes. Now when unpacking that binary data into a VARCHAR expecting ASCII data we're only left with the opening character.

Say two different programmers wrote these two separate pieces. The first programmer loads NVARCHARs into the binary fields, and the second programmer keeps trying to extract these binary fields as VARCHARs only to find the data truncated down to a single character. Obviously the best thing to do in this case is to have the programmers communicate with each other in order to maintain data consistency. However if this proves impossible for whatever reason it may prove useful to check to make sure your fields aren't being inadvertently truncated when being unpacked from their binary fields.

tl;dr Just use NVARCHAR for everything and save yourself the headache

No comments: