Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://freesoftwareextremist.com/objects/dc3b7426-0181-401d-9609-0b052a1964db">翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Sunday, 12-May-2024 20:11:01 JST</a><a href="https://freesoftwareextremist.com/users/Suiseiseki" title="suiseiseki@freesoftwareextremist.com"><img src="https://gnusocial.jp/avatar/789-48-20220724040913.webp" width="48" height="48" alt="翠星石" style="position: absolute; left: 0; top: 0;">翠星石</a><div><a href="https://poa.st/objects/35903b85-bc5f-48e9-8dca-ca9dae288935" rel="in-reply-to">in reply to</a><ul><li></ul></div></section><article><a href="https://poa.st/users/white_male">@white_male</a> No, as the lowest byte on UTF-16 may be larger than 128 and it may even be the NULL char (which truncates C character arrays) and UTF-16 characters may 4 bytes wide.<br><br>Aside from a few exceptions like the byte order mark, all valid UTF-16 character sequences map with a UTF-8 codepoint, but you'll need to use something like GNU iconv to convert it.<br><br>Still, UTF-16 is a useless encoding, as it leads to a lager filesize than UTF-8 almost always (even for books in Chinese characters, as typically there is much more ASCII formatting than text in book formats as ASCII characters double in size when encoded as UTF-16), it's still multi-width (2 or 4 bytes wide), is not self-synchronizing and has big endian and little endian variants.</article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/3078604#notice-6102681">In conversation</a><time datetime="2024-05-12T20:11:01+09:00" title="Sunday, 12-May-2024 20:11:01 JST">about a year ago</time> <span>from <span><a href="https://freesoftwareextremist.com/objects/dc3b7426-0181-401d-9609-0b052a1964db" rel="external" title="Sent from freesoftwareextremist.com via ActivityPub">freesoftwareextremist.com</a></span></span><a href="https://freesoftwareextremist.com/objects/dc3b7426-0181-401d-9609-0b052a1964db">permalink</a></footer></blockquote>

Corresponding Notice

Embed this notice
翠星石 (suiseiseki@freesoftwareextremist.com)'s status on Sunday, 12-May-2024 20:11:01 JST翠星石
in reply to
- white_male
@white_male No, as the lowest byte on UTF-16 may be larger than 128 and it may even be the NULL char (which truncates C character arrays) and UTF-16 characters may 4 bytes wide.

Aside from a few exceptions like the byte order mark, all valid UTF-16 character sequences map with a UTF-8 codepoint, but you'll need to use something like GNU iconv to convert it.

Still, UTF-16 is a useless encoding, as it leads to a lager filesize than UTF-8 almost always (even for books in Chinese characters, as typically there is much more ASCII formatting than text in book formats as ASCII characters double in size when encoded as UTF-16), it's still multi-width (2 or 4 bytes wide), is not self-synchronizing and has big endian and little endian variants.
In conversationabout a year ago from freesoftwareextremist.compermalink

Public

Embed Notice

HTML Code

Corresponding Notice