Главная
Study mode:
on
1
Introduction
2
Overview
3
Code Units
4
Encoding
5
Code Point
6
Character Sets
7
Universal Character Set
8
Unicode
9
bits
10
valid sequences
11
overlong sequences
12
boundary conditions
13
How does my converter work
14
Assumptions
15
Static Member Functions
16
States
17
Data Structures
18
Arrays
19
Basic Conversion
20
Benchmarks
21
ASCII Optimization
22
How does this work
23
Counting trailing zeros
24
DFA cycle
25
Input data
26
Libraries
27
Timings
Description:
Explore a fast and efficient approach to UTF-8 conversion using C++, Deterministic Finite Automata (DFAs), and SSE intrinsics in this conference talk from C++Now 2018. Dive into the intricacies of UTF-8, UTF-16, and UTF-32 encodings, understanding code units and code points. Learn how to construct a DFA for optimal UTF-8 conversion, implement the algorithm using simple lookup tables and C++ code, and leverage SSE intrinsics for enhanced performance. Compare this method with common implementations on Windows and Linux, demonstrating significant speed improvements. Gain insights into handling overlong and invalid byte sequences, optimizing ASCII conversion, and utilizing static member functions and data structures for efficient processing.

Fast Conversion From UTF-8 with C++ - DFAs - and SSE Intrinsics

CppNow
Add to list
0:00 / 0:00