Tip Sheet: Beware of the Byte Order Mark

When writing text files in VB.NET, encoding is important

VS 2005 includes several new file management wrappers in the My.Computer.FileSystem namespace, including WriteAllText. This method is a quick and handy way to write small sized strings to disk but this method has a 'gotcha' in it. It uses UTF-8 encoding with a Byte Order Mark by default. This mark can easily confuse programs that aren't expecting it.

What is a Byte Order Mark? Simply put, it's a set of characters at the beginning of a Unicode text file that denote how the file is encoded. For UTF-8, the characters are the byte sequence EF BB BF, which appears as the characters "".

Many Windows programs and .NET functions handle this transparently for you so you never see these characters. For example, if you open a UTF-8 encoded file in Notepad, you won't see these characters at all. This 'helpfulness' can make it difficult to spot a that the encoding of the file is a problem.

How can it be a problem? The problems come up when a program that expects ASCII encoding tries to read the file. For example, you might be sending files you create to a third party program. It loads the file and sees the Byte Order Mark and assumes that the file isn't in the correct format and rejects it. If you load the file into Notepad it looks OK to you but the target program can't read it. It can be a frustrating problem to track down.

WriteAllText is different from similar methods, System.IO.StreamWriter and System.IO.File.WriteAllText. They do write UTF-8 encoding by default but they do not include the Byte Order Mark. So, if you were using StreamWriter in VS 2002/2003 or used System.IO.File.WriteAllText elsewhere and switched to My.Computer.FileSystem.WriteAllText there is a difference in the output files you're producing.

So, to avoid this kind of problem, use the overload for My.Computer.FileSystem.WriteAllText that includes encoding and avoid the one that doesn't. In fact, my recommendation is to always specify the right encoding method no matter which method you use.

USE:

My.Computer.FileSystem.WriteAllText(MyFilename, MyString, False, System.Text.Encoding.ASCII)

System.IO.File.WriteAllText(MyFilename, MyString, False, System.Text.Encoding.ASCII)

Using sw As New StreamWriter(MyFilename, False, System.Text.Encoding.ASCII)
    'writing code here
End Using

OK (but not recommended):

System.IO.File.WriteAllText(MyFilename, MyString, False)

Using sw As New StreamWriter(MyFilename, False)
    'writing code here
End Using

AVOID:

My.Computer.FileSystem.WriteAllText(MyFilename, MyString, False)

   Save to del.icio.us

Published on Tuesday, May 22, 2007   |   © 1999-2007 J. Frank Carr, All Rights Reserved