utf-42
Loading...
Searching...
No Matches
🛠️ Technical documentation

This section explains how to use the project as well as how to install it.

🚀 Usage

This section explains how to use the project.


Basic example 1

Approach: one time use string literal.

We use the macro make_poly_enc(CharacterType, lit) to process the string literal and reencode on compile time to the desired character type. This proves particularly usefull when the character type is a template parameter.

#include <utf42/utf42.h>
// Typedef of char32_t
using char42_t = char32_t;
// Different encoding views
constexpr std::basic_string_view<char> strv_a = make_poly_enc(char, "Hello World \U0001F600!");
constexpr std::basic_string_view<char8_t> strv_8 = make_poly_enc(char8_t, "Hello World \U0001F600!");
constexpr std::basic_string_view<char16_t> strv_16 = make_poly_enc(char16_t, "Hello World \U0001F600!");
constexpr std::basic_string_view<char32_t> strv_32 = make_poly_enc(char32_t, "Hello World \U0001F600!");
constexpr std::basic_string_view<char42_t> strv_42 = make_poly_enc(char42_t, "Hello World \U0001F600!");
#define make_poly_enc(char_t, lit)
Creates a compile-time polymorphic encoded string literal.
Definition utf42.h:92

All variables above refer to the same logical string, encoded differently by the compiler.

Converting back to UTF-8 (for display)

std::string str_a (strv_a);
std::string str_8 (char8_to_char(strv_8));
std::string str_16 (utf8::utf16to8(strv_16));
std::string str_32 (utf8::utf32to8(strv_32));
std::string str_42 (utf8::utf32to8(strv_42));

⚠️ Conversion helpers such as utf8::utf16to8 come from external libraries (utf8cpp, #include <utf8cpp/utf8.h>) and are not part of utf42.

See utfcpp documentation for more information.


Basic example 2

Approach: multiple time use string literal.

We use an instance of utf42::poly_enc to store all variants and recover later the desired variant. The macro cons_poly_enc(lit) constructs the object from a single string literal.

#include <utf42/utf42.h>
// Typedef of char32_t
using char42_t = char32_t;
// Create all different encoding string views
constexpr utf42::poly_enc oText = cons_poly_enc("Hello World \U0001F600!");
Container holding all character-encoded views of a string literal.
Definition utf42.h:298
#define cons_poly_enc(lit)
Constructs a compile-time polymorphic encoded string literal view.
Definition utf42.h:125

The variable above refer to the same logical string, encoded differently by the compiler.

Converting back to UTF-8 (for display)

// Re-encode everything to utf-8
std::string str_a(oText.TXT_CHAR);
std::string str_8(char8_to_char(oText.TXT_CHAR_8));
std::string str_16(utf8::utf16to8(oText.TXT_CHAR_16));
std::string str_32(utf8::utf32to8(oText.TXT_CHAR_32));
std::string str_42(utf8::utf32to8(oText.visit<char42_t>()));
constexpr basic_string_view< char_t > visit() const noexcept
Selects the appropriate encoded string view for a given character type.
Definition utf42.h:366
basic_string_view< char16_t > TXT_CHAR_16
UTF-16 character literal.
Definition utf42.h:305
basic_string_view< char > TXT_CHAR
Narrow character literal.
Definition utf42.h:300
basic_string_view< char32_t > TXT_CHAR_32
UTF-32 character literal.
Definition utf42.h:306

It is possible to use the function template template<CharType char_t> constexpr std::basic_string_view<char_t> utf42::poly_enc::visit() const noexcept to recover the data in template scenarios.

⚠️ Conversion helpers such as utf8::utf16to8 come from external libraries (utf8cpp, #include <utf8cpp/utf8.h>) and are not part of utf42.

See utfcpp documentation for more information.


Output

std::cout << "Original: " << str_a << '\n';
std::cout << "utf-8: " << str_8 << '\n';
std::cout << "utf-16: " << str_16 << '\n';
std::cout << "utf-32: " << str_32 << '\n';
std::cout << "utf-42: " << str_42 << '\n';

Display on the terminal:

Original: Hello World 😀!
utf-8: Hello World 😀!
utf-16: Hello World 😀!
utf-32: Hello World 😀!
utf-42: Hello World 😀!

⚠️ Terminal encoding requirement

Your terminal MUST be configured to use UTF-8 for this particular example to print the output correctly. The user may re-encode the text appropriately or change the setup of the terminal.

If the terminal encoding is not UTF-8:

  • Unicode characters may appear as ?, □, or mojibake
  • This is not a bug in utf42

Common setups:

  • Linux / macOS: UTF-8 by default
  • Windows:
    • Use Windows Terminal, or
    • Run: chcp 65001

🔗 Inclusion in your project

This section explains how to include this library in your project.

Copy-pase inclusion

Simply copy the header utf42.h into your project. No additional setup or configuration is required.

Cmake inclusion

You need the FetchContent package:

include(FetchContent)

For more details, see the FetchContent documentation.

Then, copy the following code into your CMake file to fetch the utf42 library:

# Fetch the utf42 library
FetchContent_Declare(
utf42
GIT_REPOSITORY https://github.com/dante19031999/utf-42
GIT_TAG master
)
FetchContent_MakeAvailable(utf42)

After fetching the library, link it with your target. Use the following line in your CMake configuration:

target_link_libraries(mylib VISIBILITY utf42::utf42)

For more details, see the target_link_libraries documentation.