rs-unicode

New C++ Unicode library


Project maintained by CaptainCrowbar Hosted on GitHub Pages — Theme by mattgraham

Unicode Library

By Ross Smith

GitHub repository

Overview

#include "rs-unicode.hpp"
namespace RS::Unicode;

Introduction

This is my new C++ Unicode library.

The library is designed on the assumption that text processing will normally be done entirely with known-valid UTF-8 text, with unvalidated text only being encountered during input sanitization.

My original library (Unicorn) hasn’t been updated for many years; there have been enough changes to the Unicode standard since then that it will not work with a significant fraction of existing Unicode text (in particular, anything that uses emoji). Along with the library’s size (containing many features I no longer consider worthwhile), and some other design decisions I wanted to revisit, I decided that writing a new library (with some code imported form the old one) was the best approach at this point.

Index of headers

Using the library

You will need my header-only core utility library.

There is a CMakeLists.txt file that can build and install the Unicode library using the usual CMake conventions. Command line usage will typically look like this:

cd wherever/you/installed/rs-unicode
mkdir build
cd build
cmake -G "Unix Makefiles" ../src
    # or cmake -G "Visual Studio 17 2022" ../src on Windows
cmake --build . --config Release -- -j<N>
    # where <N> is your CPU core count
cmake --build . --config Release --target install'

The library’s public headers are listed above (other headers are for internal use only and should not be included by your code). To use the library, #include either the individual headers you want, or rs-unicode.hpp to include all of them.

Link your build with -lrs-unicode. You will also need -lpcre2-8. On some systems, you may also need -liconv.

General implementation notes

In a library that does string manipulation, any function that constructs or modifies a string runs the risk of a memory allocation error. This possibility is not usually explicitly documented because it is so ubiquitous. Unless the documentation explicitly says otherwise, any function that is not marked noexcept should be assumed to be capable of throwing std::bad_alloc, in addition to any exceptions explicitly documented for it.