PermaLinkNow In The Code Bin: All Unicode Characters DB
12:26:09 AM

I started my career in the IT business 27 years ago as a international products specialist at Wang Laboratories. I became something of an expert on dealing with different character sets supported on different localized versions of Wang hardware and software, and one of the indispensible tools that I used was the All Characters Document. It was a Wang Word Processing document that contained one of every possible character that could be represented, conveniently laid out in a matrix with 16 characters per row, making it easy to read the hexadecimal value of each character's actual byte value. It was not a big document, as in those days all character sets were just 7 or 8 bits, and Wang systems did not support any type of ISO 2022 style escape mechanism to extend character sets.

Even after I moved into working on email products at Wang, I did follow the evolving standards, so I was aware of what was going on with ISO 10646 and Unicode, but over the years, my knowledge of internationalization and localization has lagged a bit. On modern systems, most of the time there's nothing much to worry about, because everything just works. In the Lotus Notes and Domino world everything is LMBCS or a properly tagged MIME character set, conversions are done automatically when you need them in LotusScript and Java, and if you're working in the C API there are functions available for conversion whenever necessary. The few occasions where I've had to dig into character set issues have typically involved uncertainty about the system or JVM default character sets rather than any confusion about what's going on in the Lotus software.

A few months ago, however, I started looking into a MIME conversion issue involving a subset of the Japanese characters known as "Hankaku-Kana". I won't go into the details of the issue here, other than to say that it took me much longer than it really should have to analyze, understand, and solve the problem -- and that even when I had the solution I was thrown off track by one particular document because it turned out that it contained damaged LMBCS. About half-way through the process I realized that what I really needed was the Lotus Notes and Domino equivalent of the All Characters Document, so I created it.

Well... what I created is more than just a document. I created a database with one document per character, and a simple view that displays them. It's technically an All Unicode Characters Database, since the starting point for it was a file on the Unicode Consortium web site called UnicodeData.txt (there's a description of the format of the file here). I'm not 100% certain that everyting in LMBCS is also in Unicode, and I'm not sure that everything in Unicode is actually in the file, so there may be some gaps, but for many if not most intents and purposes, I think my database is a complete enough reference. Here is a screen shot of the All Characters view:


In addition to creating the individual documents per charcter, I also created a couple of true All Characters Documents. I created one document that has 16 characters per line of rich text, and another with each character rendered on its own line of rich text. I did not go the trouble or programming a nice tabular matrix, so it's not quite as easy to work with these documents as it was with the one that I used at Wang, but in both cases I did put the code point values in the document along with the characters so these documents can serve as nice self-documenting test case data for any process that needs to work on rich text and be sure that it is handling all character conversions correctly. Here's a screen shot of the one-character-per-line document:


One caveat to bear in mind when working with this database is that its ability to actually show all Unicode characters is potentially limited by a couple of things.There are large blocks of Unicode code points that are not rendered as characters on the screen on my computer, and for any given case I don't really know what the limiting factor is. I can think of two. I've already mentioned that I'm not sure that everything in LMBCS is in Unicode, but I'm also not sure that everything in Unicode is also in LMBCS. In fact, it wouldn't surprise me at all if there are things that have been added to Unicode since after LMBCS was finalized. Any Unicode characters that are not available in LMBCS will not display correctly in the database. Also, fonts come into play. I set up the form and rich text to use the font @Arial Unicode MS, but I'm not 100% certain that this font really renders all of Unicode. Naturally, and character that this font doesn't render correctly will not display correctly in the database. As i said above, however, for most intents and purposes this database is complete enough.

At 21 MB, the database is a bit larger than I want to have downloaded directly from my blog, and unlike many Notes databases it doesn't compress much at all when zipped, so I've uploaded it to the Code Bin on the OpenNTF site. You can download the database UnicodeData.nsf from there.

UPDATE: A new version has been posted. Read about it here.

This page has been accessed 552 times. .
Comments :v

1. Dennis van Remortel02/15/2010 05:04:10 AM


2. Ben Langhinrichs02/15/2010 10:41:56 AM

Terrific resource, Rich! Thanks for making this available.

3. David Schaffer02/15/2010 01:46:49 PM

I can't open the downloaded database. Says it has local encryption.

4. Richard Schwartz02/15/2010 02:06:34 PM

No wonder zip didn't shrink it previously! That really should have been a clue for me to look at the encryption.

I'm in the process of posting a non-encrypted version, zipped.

5. Richard Schwartz02/15/2010 02:25:10 PM

Non-encrypted version is now posted -- in zip file this time, and the link in the article has been updated to point to it.

6. vesoftware11/05/2013 10:17:33 PM

Agen Bola Promo 100% SBOBET IBCBET Casino Poker Tangkas Online
ITUPOKER.COM AGEN POKER ONLINE INDONESIA TERPERCAYA : Toko belanja online murah, Promo heboh jual barang hanya Rp 1,-

7. ninest12304/23/2016 04:47:08 AM

8. gabbar bhaijaan12/15/2016 06:58:21 AM

9. kiss day01/14/2017 12:42:05 AM
Homepage: http://

[URL=]kiss day images[/URL]
[URL=]kiss day images 2017[/URL]
[URL=]kiss day quotes 2017[/URL]
[URL=]kiss day sms 2017[/URL]
[URL=]kiss day pictures 2017[/URL]

10. kiss day01/14/2017 12:45:49 AM
Homepage: http://

<a href=""></a>
<a href="">kiss day images 2017</a>
<a href="">kiss day images</a>
<a href="">kiss day 2017</a>
<a href="">kiss day quotes 2017</a>
<a href="">kiss day sms 2017</a>
<a href="">kiss day pictures 2017</a>

11. kiss day01/14/2017 12:47:14 AM
Homepage: http://

12. Ranbirkapoor01/25/2017 01:48:10 AM

Pretty nice information. it has a better understanding. thanks for spending time on it.

13. nike air max03/14/2017 05:16:29 AM

kate spade handbags
coach factory outlet
Coach Outlet Store Online
Michael Kors Outlet
polo outlet
Polo Ralph Lauren Outlet Store
yeezy boost 350
Red Bottom Shoes
adidas nmd
Christian Louboutin Outlet
true religion outlet
true religion jeans outlet
Pandora Jewelry
Ferragamo Shoes
North Face Jackets
Polo Ralph Lauren Outlet
kate spade outlet
Ray Ban Sunglasses
Coach Outlet Store Online
Coach Factory Outlet Online
Coach Outlet Online
Coach Outlet
Michael Kor Factory Outlet
Michael Kors Outlet Online
Michael Kors Outlet
Michael Kors Factory Outlet Online
michael kors outlet
Coach Outlet Clearance
Michael Kors Online
Coach Outlet online
Michael Kors Outlet Online
nike shoes
nike outlet
cheap Jordan
retro jordan shoes
tory burch outlet
tory burch sale
oakley outlet online
cheap oakley sungalsses
ray ban outlet online
ray ban sunglasses outlet
coach outlet
the north face outlet
Michael Kors Bags
Coach Factory Outlet
north face jackets
Coach Outlet

14. yaoxuemei11/29/2017 12:52:57 AM

15. chenlixiang12/08/2017 05:46:11 AM

Enter Comments^

Email addresses provided are not made available on this site.

You can use UUB Code in your posts.

[b]bold[/b]  [i]italic[/i]  [u]underline[/u]  [s]strikethrough[/s]

URL's will be automatically converted to Links

:-x :cry: :laugh: :-( :cool: :huh: :-) :angry: :-D ;-) :-p :grin: :rolleyes: :-\ :emb: :lips: :-o
bold italic underline Strikethrough

Remember me    

Monthly Archive
Responses Elsewhere

About The Schwartz


All opinions expressed here are my own, and do not represent positions of my employer.