PermaLinkVersion 1.1 Of My All Characaters Database Is Posted To The Code Bin
I've finally gotten around to polishing it up a bit and posting it. Click here to get version 1,1 of my All Characters Database. It contains...

Individual Notes documents for every Unicode character, as seen here:


Each document shows one rendered character, the 16 bit Unicode code point value in hex and in decimal, the UTF-8 representatuib if the code point in hex, the Unicode standard name for the character, a bunch of other information about the character taken from the standard UnicodeData.txt file maintained by the Unicode Consortium, including stuff to make any character set geek salivate, like the lowercase mapping to (0x00FC) and the decomposition into component characters U (0x0055) and dieresis (0x0308).

There are three views organizing these documents and displaying useful info. There is a view sorted by LMBCS code points, as shown here:


There is also a view that is sorted by the actual character values, so it should show in your system's local sort order -- though you might have to rebuild the view to see that. Finally, and probably most useful for anyone dealing with characters as they are known to most of the rest of the world, there is a view is sorted by Unicode UTF-16 code points.

There is also single a large "All Unicode Characteres In Rich Text" document containing every single character, as seen here:


The information in this document is largely redundant with that you see in the database views, but this is where it goes from reference source to useful tool. If you have written code for processing Notes rich text and you want to make sure that you're handling all LMBCS characters correctly, you can just feed this document through it and make sure that the output is what you expect.

Finally, if you look beneath the covers of the database, you will find the agents that I used to pull information fromt he UnicodeData.txt file and populate the database with all the documents for the individual characters, and the agent to create the the

(One last note: if you go to the Code Bin, you will see four documents labeled "All Unicode Characters Database". The second one, which is the only one containing the file, is the one you want. The otheres are the original, and my attempts to edit the original to point to the updated version, which I did without realizing/remembering that the design of the Code Bin apparently includes versioning so that updates become new documents rather than overwriting what is there -- which is not particularly user-friendly given that the button one clicks just says "Edit Document" rather than "Create New Version".)

1. Erik Brooks03/08/2010 10:10:30 PM

This is fantastic - thanks for the effort!

2. Dennis van Remortel03/09/2010 03:26:21 AM

awesome! Thanks!

3. Robert Ibsen Voith03/25/2010 05:28:44 AM

Very cool initiative. However, the latest 1.1database on OpenNTF are encrypted ....

