GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Joaquim Homrighausen (joho@mastodon.online)'s status on Friday, 31-Jan-2025 19:55:58 JST Joaquim Homrighausen Joaquim Homrighausen

    Why does this PHP construct:

    normalizer_normalize( $search_string, \Normalizer::FORM_D );

    Convert ÖÖÖ to OOO, but keeps ÅÅÅ as ÅÅÅ ... WTF?! 🤔

    #programming #php #wtf #utf #utf8

    In conversation about 4 months ago from mastodon.online permalink
    • Embed this notice
      Tobias Hellgren (thanius@mastodon.chuggybumba.com)'s status on Friday, 31-Jan-2025 19:55:58 JST Tobias Hellgren Tobias Hellgren
      in reply to

      @joho Because ö is a diacritic while å is a letter

      In conversation about 4 months ago permalink
    • Embed this notice
      Peter Krefting (nafmo@social.vivaldi.net)'s status on Friday, 31-Jan-2025 23:11:27 JST Peter Krefting Peter Krefting
      in reply to

      @joho NFD (#Unicode Normalization Form Canonical Decomposition) should fully decompose the strings, so Ö should become O + combining diaresis, and Å (and Å) would be A + combining ring above.

      NFC (...Canonical Composition) is usually more compact, it recombines into base characters, so Ö stays an Ö, O + diaresis becomes an Ö, and an Å becomes an Å.

      I would expect "FORM_D" to be NFD, but I am not a #PHP programmer.

      In conversation about 4 months ago permalink
    • Embed this notice
      Peter Krefting (nafmo@social.vivaldi.net)'s status on Saturday, 01-Feb-2025 00:05:44 JST Peter Krefting Peter Krefting
      in reply to
      • Tobias Hellgren
      • Alerta! Alerta!
      • Lawrence Pritchard Waterhouse

      @joho @heiglandreas @thanius @lpwaterhouse I rolled my own transliteration, using RFC 1345 as a base, once.

      I do not recommend doing that (not only because the RFC is severely outdated now, but also because the output turns into garbage).

      In conversation about 4 months ago permalink
    • Embed this notice
      Joaquim Homrighausen (joho@mastodon.online)'s status on Saturday, 01-Feb-2025 00:05:45 JST Joaquim Homrighausen Joaquim Homrighausen
      in reply to
      • Tobias Hellgren
      • Peter Krefting
      • Alerta! Alerta!
      • Lawrence Pritchard Waterhouse

      @heiglandreas

      Yes, transliteration is the way to go in this case, which is what I'm doing now.

      Thanks for all the advice, and pointers in the right direction.

      @thanius @lpwaterhouse
      @nafmo

      In conversation about 4 months ago permalink
    • Embed this notice
      Joaquim Homrighausen (joho@mastodon.online)'s status on Saturday, 01-Feb-2025 00:05:46 JST Joaquim Homrighausen Joaquim Homrighausen
      in reply to
      • Tobias Hellgren
      • Alerta! Alerta!
      • Lawrence Pritchard Waterhouse

      @heiglandreas

      The data is stored in an SQL database. I've started to encrypt the (sensitive parts of) data at rest. So I need to do in-memory comparisons and sorting.

      Normally, I would compare w/all umlauts, etc, but in this particular case, I want to get a match on "vårsol" when I'm searching for "vårsol" or "varsol". And this matching is, after decryption, done in the application layer.

      (And I don't want to use specific database functionality to handle all this.)

      @thanius @lpwaterhouse

      In conversation about 4 months ago permalink
    • Embed this notice
      Alerta! Alerta! (heiglandreas@phpc.social)'s status on Saturday, 01-Feb-2025 00:05:46 JST Alerta! Alerta! Alerta! Alerta!
      in reply to
      • Tobias Hellgren
      • Lawrence Pritchard Waterhouse

      @joho But wouldn't transliteration be more what you are looking for?

      'Cause Normalization just handles how the Unicode-Character is stored internally. So an 'Ä' should always 'look' the same, but the HEX-code might be different.

      But transliteration converts from something into something else. And in your case you want to compare kind of based on ASCII if I see that correctly.

      Feel free to check out https://andreas.heigl.org/2021/06/23/transliter-what/

      /cc @thanius @lpwaterhouse

      In conversation about 4 months ago permalink

      Attachments


    • Embed this notice
      Joaquim Homrighausen (joho@mastodon.online)'s status on Saturday, 01-Feb-2025 00:05:48 JST Joaquim Homrighausen Joaquim Homrighausen
      in reply to
      • Tobias Hellgren
      • Alerta! Alerta!
      • Lawrence Pritchard Waterhouse

      @heiglandreas I didn't do that part, I'm just looking at the output, which is what I need to be correct.

      But @thanius and @lpwaterhouse may be onto something here.

      Maybe I'll just stick to transliteration then. I'm probably overworking the code, but I hate to leave thing to "chance" when I develop.

      In conversation about 4 months ago permalink
    • Embed this notice
      Alerta! Alerta! (heiglandreas@phpc.social)'s status on Saturday, 01-Feb-2025 00:05:48 JST Alerta! Alerta! Alerta! Alerta!
      in reply to
      • Tobias Hellgren
      • Lawrence Pritchard Waterhouse

      @joho Stupid question perhaps: Why are you using normalization when the output just needs to look correct?

      What problem are you trying to solve?
      /cc @thanius @lpwaterhouse

      In conversation about 4 months ago permalink
    • Embed this notice
      Alerta! Alerta! (heiglandreas@phpc.social)'s status on Saturday, 01-Feb-2025 00:05:49 JST Alerta! Alerta! Alerta! Alerta!
      in reply to

      @joho And what does the HEX characters actually say?

      In conversation about 4 months ago permalink
    • Embed this notice
      Alerta! Alerta! (heiglandreas@phpc.social)'s status on Saturday, 01-Feb-2025 00:05:50 JST Alerta! Alerta! Alerta! Alerta!
      in reply to

      @joho Wasn't there something with locale? Or the underlying ICU version? Something knocks from deep down in my mind....

      In conversation about 4 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.