GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Eniko Fox (eniko@peoplemaking.games)'s status on Sunday, 08-Dec-2024 19:59:45 JST Eniko Fox Eniko Fox

    Trying to figure out how to print a UTF32 character in C and so far the answer seems to be "you can't"

    In conversation about 7 months ago from peoplemaking.games permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 08-Dec-2024 19:59:44 JST Rich Felker Rich Felker
      in reply to

      @eniko On conforming implementions, printf("%lc", unicode_codepoint_val);

      In conversation about 7 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 08-Dec-2024 20:00:50 JST Rich Felker Rich Felker
      in reply to

      @eniko The myth that this is hard is entirely Microsoft's implementation being gratuitously and intentionally broken.

      In conversation about 7 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 08-Dec-2024 20:44:53 JST Rich Felker Rich Felker
      in reply to

      @eniko wint_t, but default promotions from wchar_t should be fine.

      In conversation about 7 months ago permalink
    • Embed this notice
      Eniko Fox (eniko@peoplemaking.games)'s status on Sunday, 08-Dec-2024 20:44:54 JST Eniko Fox Eniko Fox
      in reply to
      • Rich Felker

      @dalias what type is unicode_codepoint_val

      In conversation about 7 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 08-Dec-2024 20:51:20 JST Rich Felker Rich Felker
      in reply to

      @eniko Because Windows is wrong. If wchar_t is too narrow for full Unicode you're not allowed to support all of Unicode. C explicitly forbids "multi wchar_t chars" (thus UTF-16) which they do because they insisted on contradicting the experts in the early 90s who told them 16 bits wasn't enough and got themselves stuck. C11 strongly prefers wchar_t numeric vals be UCS codepoints (there's a macro that tells you this) and unless I'm misremembering, C23 requires it.

      In conversation about 7 months ago permalink
      Haelwenn /элвэн/ :triskell: likes this.
    • Embed this notice
      Eniko Fox (eniko@peoplemaking.games)'s status on Sunday, 08-Dec-2024 20:51:21 JST Eniko Fox Eniko Fox
      in reply to
      • Rich Felker

      @dalias everything i've found tells me not to use wchar_t because it is unclear what width its going to be

      In conversation about 7 months ago permalink
    • Embed this notice
      Lulu · לולו (lulu@hachyderm.io)'s status on Sunday, 08-Dec-2024 20:57:57 JST Lulu · לולו Lulu · לולו
      in reply to
      • Rich Felker

      @dalias @eniko

      The fact that UTF-16 can't die is just wild.

      In conversation about 7 months ago permalink
    • Embed this notice
      Eniko Fox (eniko@peoplemaking.games)'s status on Sunday, 08-Dec-2024 20:58:48 JST Eniko Fox Eniko Fox
      in reply to
      • Rich Felker

      @dalias ok so then how do i support printing cross platform 32-bit unicode code points

      In conversation about 7 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 08-Dec-2024 20:58:48 JST Rich Felker Rich Felker
      in reply to

      @eniko With modern Windows, you can set the locale codepage to UTF-8 and it should just work doing everything in UTF-8 not touching wchar_t. Arguably this is the best way to do things, but it doesn't respect systems with legacy unix systems with non-UTF-8 encodings. Modern C also has char32_t (always UTF-32) which can be used if you're worried the system wchar_t is broken like on Windows but what you can easily do with it is limited..

      In conversation about 7 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 08-Dec-2024 21:16:46 JST Rich Felker Rich Felker
      in reply to

      @eniko 7.30.1 from C23:

      In conversation about 7 months ago permalink

      Attachments


      1. https://media.hachyderm.io/media_attachments/files/113/617/150/829/546/604/original/1e9e11aa94f5ba94.png
    • Embed this notice
      Eniko Fox (eniko@peoplemaking.games)'s status on Sunday, 08-Dec-2024 21:16:47 JST Eniko Fox Eniko Fox
      in reply to
      • Rich Felker

      @dalias from what I read char32_t isn't actually guaranteed to be utf32 and also I couldn't find a way to print it

      In conversation about 7 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 08-Dec-2024 21:30:05 JST Rich Felker Rich Felker
      in reply to

      @eniko Unfortunately the only way to print it is c32rtomb to convert it to a multibyte char string (in any reasonable setup this is UTF-8) in the current locale encoding.

      In conversation about 7 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 08-Dec-2024 21:44:05 JST Rich Felker Rich Felker
      in reply to
      • Ben Evans
      • Lulu · לולו

      @kittylyst @lulu @eniko Getting rid of Java? 😈

      In conversation about 7 months ago permalink
    • Embed this notice
      Ben Evans (kittylyst@mastodon.social)'s status on Sunday, 08-Dec-2024 21:44:06 JST Ben Evans Ben Evans
      in reply to
      • Rich Felker
      • Lulu · לולו

      @lulu @dalias @eniko Java's internal representation for non-ASCII strings is UTF-16 and its not immediately clear how that could be changed. So I think it'll be around for the forseeable future.

      In conversation about 7 months ago permalink
    • Embed this notice
      Ben Evans (kittylyst@mastodon.social)'s status on Sunday, 08-Dec-2024 21:45:57 JST Ben Evans Ben Evans
      in reply to
      • Rich Felker
      • Lulu · לולו

      @dalias @lulu @eniko Number of active server JVMs in the wild continues to increase, having doubled in ~6 years IIRC.

      In conversation about 7 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 08-Dec-2024 21:45:57 JST Rich Felker Rich Felker
      in reply to
      • Ben Evans
      • Lulu · לולו

      @kittylyst @lulu @eniko 🤮

      In conversation about 7 months ago permalink
    • Embed this notice
      Rich Felker (dalias@hachyderm.io)'s status on Sunday, 08-Dec-2024 21:57:01 JST Rich Felker Rich Felker
      in reply to

      @eniko C23 now mandates that.

      In conversation about 7 months ago permalink
    • Embed this notice
      Eniko Fox (eniko@peoplemaking.games)'s status on Sunday, 08-Dec-2024 21:57:02 JST Eniko Fox Eniko Fox
      in reply to
      • Rich Felker

      @dalias i found https://beej.us/guide/bgc/html/split/unicode-wide-characters-and-all-that.html earlier and it says:

      are values in these stored in UTF-16 or UTF-32? Depends on the implementation.

      But you can test to see if they are. If the macros __STDC_UTF_16__ or __STDC_UTF_32__ are defined (to 1) it means the types hold UTF-16 or UTF-32, respectively.

      In conversation about 7 months ago permalink

      Attachments

      1. No result found on File_thumbnail lookup.
        Beej's Guide to C Programming

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.