GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Untitled attachment

Download link

Notices where this attachment appears

  1. Embed this notice
    pwm (pwm@darkdork.dev)'s status on Tuesday, 01-Jul-2025 01:48:43 JST pwm pwm
    in reply to

    @Pi_rat @p @SuperDicq

    Hopefully some of this is helpful to you :)

    init.sql:

    If you are not writing portable sql schemas, then it is not necessary to write "integer not null primary key autoincrement." autoincrement incurs a performance penalty and should be omitted for integer primary keys1.

    Additionally, you may safely use datatype TEXT to store the names of your tags, as under the hood, sqlite will treat these the same. In fact, if you wish, you do not have to specify a datatype at all2

    tagg.py:

    The first, biggest thing to consider is the logic related to applying tags to images. Take a look at the non-standard (but very useful) upsert clause3. Present in both sqlite and postgresql, it allows you to handle a uniqueness constraint violation inside the query, without using program logic.

    For instance this would let you write something like:

    insert into image_tags values (:id, :tag) on conflict do nothing

    This turns duplicate tag application into a no-op, saving you some trouble and making your code nicer.

    Finally, in the sqlite implementation, cursor.executemany() is functionally equivalent to manually iterating over a loop yourself, and in this case would not require building a list of dictionaries manually to iterate over, if you are concerned with resource usage/speed.

    helpers.py

    init_db:

    It is not necessary to manually parse the database initialization script and execute the commands one at a time. Instead use cursor.executescript() with the loaded contents of the file as a parameter.

    add_images:

    It may be interesting to take an alternative approach to loading all images known in the directory, and instead repeatedly ask the database if a given file is already present. e.g.

    select 1 from images where dir_id = :dir_id and image = :image

    This will either return 1 or you will get an empty result set from your cursor. This will probably be faster and will definitely use less memory.

    Also, in general prefer named parameters over positional ones (:dir_id, :image) is better than (?, ?). This is stylistic and a personal choice so buyer beware.

    create_insert_tuple:

    This whole method can probably be gotten rid of by switching your logic to using the upsert syntax to apply tags.

    General Notes

    Where are you actually committing to the database? AFAICT you're never writing anything to disk unless somehow executemany is autocommitting???

    In general, shift all logic related to processing data into the database. It is very good at doing those tasks very fast. Brushing up on sql will go a long way here towards simplifying your python code as much as possible. The sqlite docs are wonderful.

    Try to avoid doing data processing in pure python, it is very not fast at that.

    Autoincrement in SQLite

    Notes on Datatypes

    Upsert

    In conversation about 5 months ago from darkdork.dev permalink
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.