Code points to binaries in Elixir
A code point is a numerical representation of a character in the Unicode Standard. Elixir uses UTF-8 to encode strings and binaries.
For example, code point for letter
ę equals to
iex(1)> ?ę 281
Here is how UTF-8 encoding serializes numbers on multiple bytes:
|Length||First byte||Following bytes|
281 code point requires 2 bytes for representation in UTF-8 encoding (a single UTF-8 byte can encode numbers from 0 to 127 only).
If we convert
281 straight to a binary, we will receive
100011001. Placing this sequence of 1/0 on placeholders for two-bytes length from the table above gives the following bytes:
11000100 11011001. In the base-10 system, they are represented as
153 which are our binaries:
iex(2)> "ę" <> <<0>> <<196, 153, 0>> iex(3)> "foo" <> <<0>> <<102, 111, 111, 0>>
Concatenating a string with with the null byte
<<0>> returns its inner binary representation.