Conversation

Notices

Embed this notice
Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 00:22:13 JST Pi_rat
- SuperDicq
- pistolero
So I have been making image tagging program that will allow for keys to be bound to tags so at once an image can be tagged with multiple tags. This is my first serious project and I have been having lot of fun working on it.

This is script prototype for it , that requires (-f) file with names of images in (-d) dir to be tagged with (-t) tag. It uses sqlite3 as db(and so will gui version). I still have to write query script for this and this can only tag. All of this files have to be in same directory, and it creates db file $HOME/.local/share/tagg/tags.db (--init)

I was first going for separate db for every directory, but then friend suggested to "not do that". Retroactively adding this was not fun but it is done. -u is needed for every directory first time

I use sxiv -o to get image names.

Now I'm certainly doing lot of things wrong like having raw sql statements in functions and passing cursor to function. So I need criticism, and hard. Be harsh, make me cry.

PS
@p How do you manage your images since you always seem to have relevant images ready for occasion.

cc @SuperDicq
In conversation about 5 months ago from freesoftwareextremist.com permalink
Attachments
1. helpers.py
2. conf.py
  
  Invalid filename.
3. init.sql
  
  Invalid filename.
4. tagg.py
- pistolero and pwm like this.
- Embed this notice
  SuperDicq (superdicq@minidisc.tokyo)'s status on Tuesday, 01-Jul-2025 00:27:28 JST SuperDicq
  in reply to
  - pistolero
  @Pi_rat@freesoftwareextremist.com @p@fsebugoutzone.org I gotta say this is not the most ideal way of distributing software. Your instance has turned these filenames into a bunch of random hashes.
  
  In conversation about 5 months ago permalink
  
  snacks and pistolero like this.
- Embed this notice
  SuperDicq (superdicq@minidisc.tokyo)'s status on Tuesday, 01-Jul-2025 00:29:25 JST SuperDicq
  in reply to
  - pistolero
  - laurel
  @laurel@fsebugoutzone.org @Pi_rat@freesoftwareextremist.com @p@fsebugoutzone.org Hydrus is the most disgusting pile of bloated chanware that I've ever used with the worst UI known to man.
  
  I like the idea of the program and I tried to like it so much but I just absolutely despise it.
  
  I will most likely prefer something that is much more minimal for this like the less than 200 LOC thing that Pi_rat has written probably.
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  laurel (laurel@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 00:29:26 JST laurel
  in reply to
  - SuperDicq
  - pistolero
  @Pi_rat @SuperDicq @p
  
  >How do you manage your images since you always seem to have relevant images ready for occasion.
  Have you ever used hydrus? https://hydrusnetwork.github.io/hydrus/index.html
  I've been using it for years now.
  In conversation about 5 months ago permalink
  Attachments
  1. Domain not in remote thumbnail source whitelist: hydrusnetwork.github.io
    
    Home - hydrus network
    
    A personal booru-style media tagger that can import files and tags from your hard drive and popular websites. Content can be shared with other users via user-run servers.
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 00:29:45 JST pistolero
  in reply to
  - SuperDicq
  - pistolero
  @Pi_rat @SuperDicq
  
  > @p How do you manage your images since you always seem to have relevant images ready for occasion.
  
  I have a good memory (see attached) and also `find|grep` and also somewhat more recently I installed Tesseract and that lets me grep images.
  
  As far as the code, just from reading the SQL it looks like this is how you'd want to structure it. sqlite3 is good for this kind of thing; it's probably the easiest option. (Flat files and grep aren't to be underestimated, but you don't learn a lot from doing things that way.)
  incelligence.jpeg
  In conversation about 5 months ago permalink
  Attachments
  1. incelligence.jpeg
    https://media.freespeechextremist.com/rvl/full/3565f98b396dc846dbb4a1687d13158aa5ce3e1a0fc8b55f5cf4746027b2f702?name=incelligence.jpeg
  Haelwenn /элвэн/ :triskell: likes this.
- Embed this notice
  laurel (laurel@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 00:30:15 JST laurel
  in reply to
  - SuperDicq
  - pistolero
  @SuperDicq @p @Pi_rat
  
  >Hydrus is the most disgusting pile of bloated chanware that I've ever used with the worst UI known to man.
  This is true, yes.
  
  In conversation about 5 months ago permalink
  
  narcolepsy and alcoholism :flag: likes this.
- Embed this notice
  SuperDicq (superdicq@minidisc.tokyo)'s status on Tuesday, 01-Jul-2025 00:32:07 JST SuperDicq
  in reply to
  @hj@shigusegubu.club @laurel@fsebugoutzone.org @Pi_rat@freesoftwareextremist.com @p@fsebugoutzone.org The thing I hate about it the most is that it really wants to take control of your filesystem and move copies of all your files into it's own folder structure.
  
  I don't want it to that. Just leave my fucking files where they are and don't make a copy. Just keep track of where my files in your database.
  
  In conversation about 5 months ago permalink
- Embed this notice
  narcolepsy and alcoholism :flag: (hj@shigusegubu.club)'s status on Tuesday, 01-Jul-2025 00:32:09 JST narcolepsy and alcoholism :flag:
  in reply to
  @laurel @Pi_rat @p @SuperDicq and i still love it
  
  In conversation about 5 months ago permalink
- Embed this notice
  narcolepsy and alcoholism :flag: (hj@shigusegubu.club)'s status on Tuesday, 01-Jul-2025 00:35:26 JST narcolepsy and alcoholism :flag:
  in reply to
  @SuperDicq @p @laurel @Pi_rat that is extremely complicated, and i can tell this because i myself tried to implement that. You keep a filename: namespace if you need filenames and not the structure (you can also import structure as tags)
  man held up by ps3 games.jpg
  In conversation about 5 months ago permalink
  Attachments
  1. man held up by ps3 games.jpg
    https://shigusegubu.club/media/109c1205-88f5-440a-b223-4c650b1bd2c3/%20man%20held%20up%20by%20ps3%20games.jpg?name=%20man%20held%20up%20by%20ps3%20games.jpg
  pistolero likes this.
- Embed this notice
  SuperDicq (superdicq@minidisc.tokyo)'s status on Tuesday, 01-Jul-2025 00:40:28 JST SuperDicq
  in reply to
  - pistolero
  @Pi_rat@freesoftwareextremist.com @p@fsebugoutzone.org Anyway I read through the code and I see no major issues with it. It looks simple and clear to read. The database schema also seems properly normalized.
  
  I'm most curious about what the UI for this will look like because that will be what makes or breaks it for me. Tagging an image with multiple tags and then going to the next should require the least amount of clicks and button presses.
  
  In conversation about 5 months ago permalink
- Embed this notice
  Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 00:40:30 JST Pi_rat
  in reply to
  - SuperDicq
  - pistolero
  @SuperDicq @p
  
  :( yes I know but I dont want to host it anywhere except my repo, codeberg is good but I dont have my personal email to sign up.
  
  https://paste.debian.net/1383579/ (helpers.py)
  
  https://paste.debian.net/1383581/ (tagg.py this is main script)
  
  https://paste.debian.net/1383583/ (conf.py for now its really small but I am thinking storing directories that and some behaviors)
  
  https://paste.debian.net/1383585/ (init.sql)
  In conversation about 5 months ago permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    debian Pastezone
  2. Untitled attachment
  3. No result found on File_thumbnail lookup.
    
    debian Pastezone
  4. Untitled attachment
  5. No result found on File_thumbnail lookup.
    
    debian Pastezone
  6. Untitled attachment
  7. No result found on File_thumbnail lookup.
    
    debian Pastezone
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 00:44:37 JST pistolero
  in reply to
  - SuperDicq
  @SuperDicq @Pi_rat Well, it is easier to view in the browser; tarball might be easier though.
  
  In conversation about 5 months ago permalink
- Embed this notice
  narcolepsy and alcoholism :flag: (hj@shigusegubu.club)'s status on Tuesday, 01-Jul-2025 01:09:16 JST narcolepsy and alcoholism :flag:
  in reply to
  @p @Pi_rat @laurel @SuperDicq for music i agree, for images (memes/arts) etc - not really.
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:09:18 JST pistolero
  in reply to
  @SuperDicq @hj @laurel @Pi_rat
  
  > Just leave my fucking files where they are and don't make a copy.
  
  ABSOLUTELY YES THAT
  
  When x11amp^Wxmms finally stopped building, the main thing I wanted from a music player was "I give it a directory and it plays the music in that directory."
  
  In conversation about 5 months ago permalink
- Embed this notice
  laurel (laurel@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:12:24 JST laurel
  in reply to
  @p @SuperDicq @Pi_rat @hj
  
  What happens if you've got a couple of million images and you start moving things around? Does it go through all moved directories and recompute hashes?
  Things get even worse with architectures that support publishing/sharing your images and tags with others.
  
  In conversation about 5 months ago permalink
  
  narcolepsy and alcoholism :flag: and pistolero like this.
- Embed this notice
  Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 01:21:19 JST Pi_rat
  in reply to
  - SuperDicq
  - pistolero
  @SuperDicq @p I am thinking more like sxiv with sidebar(and sxiv key binds so n saves whatever tags were added to this image and moves to next) for manipulating tags and keys they are bound to, Im also thinking since I will add "views" for advance querying, like (meme or pepe) and ( computer or wizard )
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:30:59 JST pistolero
  in reply to
  @hj @Pi_rat @SuperDicq @laurel I don't even like gtk doing thumbnails.
  
  In conversation about 5 months ago permalink
- Embed this notice
  Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 01:31:22 JST Pi_rat
  in reply to
  @laurel @p @SuperDicq
  
  I have not personally used Hydrus or any other application, I tried DigiKam but gave up on quite early. This seemed like easy first project so I thought why not. Mainly I needed app that I can quickly tag, but thinking and writing, I have so many ideas for gui version like multiple OR and AND clauses...
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  SuperDicq (superdicq@minidisc.tokyo)'s status on Tuesday, 01-Jul-2025 01:31:39 JST SuperDicq
  in reply to
  @p@fsebugoutzone.org @hj@shigusegubu.club @laurel@fsebugoutzone.org @Pi_rat@freesoftwareextremist.com I use VLC as my music player for this exact purpose "open directory" is all I need.
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 01:32:25 JST Pi_rat
  in reply to
  - SuperDicq
  - pistolero
  @p @SuperDicq I can relate, I can generally guess the time frame where image will be depending on surrounding images but I would still like to be able to quickly find them......
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  narcolepsy and alcoholism :flag: (hj@shigusegubu.club)'s status on Tuesday, 01-Jul-2025 01:32:49 JST narcolepsy and alcoholism :flag:
  in reply to
  @laurel @Pi_rat @p @SuperDicq you don't even have to have millions, tens of thousands are already unwieldy
  Screenshot_20250630_191327.png
  In conversation about 5 months ago permalink
  Attachments
  1. Screenshot_20250630_191327.png
    https://shigusegubu.club/media/257975f9-dee5-4b0c-8218-76a4ebfd9638/Screenshot_20250630_191327.png?name=Screenshot_20250630_191327.png
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:33:01 JST pistolero
  in reply to
  @laurel @Pi_rat @SuperDicq @hj
  
  > Does it go through all moved directories and recompute hashes?
  
  Does what?
  
  In conversation about 5 months ago permalink
- Embed this notice
  SuperDicq (superdicq@minidisc.tokyo)'s status on Tuesday, 01-Jul-2025 01:37:31 JST SuperDicq
  in reply to
  @p@fsebugoutzone.org @laurel@fsebugoutzone.org @Pi_rat@freesoftwareextremist.com @hj@shigusegubu.club Why do you even need hashes? The code that was just posted simply has a database that stores where things are. Maybe hashes could be useful for things like deduplication but that's outside of what this program is intended to do.
  
  In conversation about 5 months ago permalink
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:39:27 JST pistolero
  in reply to
  @hj @laurel @Pi_rat @SuperDicq I'm not quite at "tens of thousands" yet.
  memes.png
  In conversation about 5 months ago permalink
  Attachments
  1. memes.png
    https://media.freespeechextremist.com/rvl/full/629379416d1e93a60d25a96fd2605842e9d3319e1ae81ca7a1427de5f5f529dd?name=memes.png
- Embed this notice
  feld (feld@friedcheese.us)'s status on Tuesday, 01-Jul-2025 01:40:55 JST feld
  in reply to
  @SuperDicq @hj @p @laurel @Pi_rat
  
  > move copies of all your files into it's own folder structure.
  
  someone just patch it to add a setting so it does this with hardlinks and then it doesn't matter as much
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:41:57 JST pistolero
  in reply to
  @feld @SuperDicq @Pi_rat @hj @laurel
  
  > and then it doesn't matter as much
  
  Some filesystems don't support hard links.
  
  In conversation about 5 months ago permalink
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:45:37 JST pistolero
  in reply to
  @SuperDicq @hj @laurel @Pi_rat Yeah, I use cmus. `:add sound/` and there it all is. I don't know how easy it is to remap keys in VLC, but I actually built cmus on my n900 so that I could push keyboard buttons without looking at the device and crashing my car. (I still use the uConsole like this.)
  
  In conversation about 5 months ago permalink
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:47:12 JST pistolero
  in reply to
  @SuperDicq @hj @laurel @Pi_rat Yeah; I think that's how you want to do it. Just tag paths.
  
  In conversation about 5 months ago permalink
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:48:13 JST pistolero
  in reply to
  @laurel @Pi_rat @SuperDicq @hj I am not sure what thing is under discussion. Does what make hashes?
  
  In conversation about 5 months ago permalink
- Embed this notice
  laurel (laurel@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:48:14 JST laurel
  in reply to
  @p @Pi_rat @SuperDicq @hj
  
  How is it going to know it's the same files. If there are any discrepancies between its internal representation and the files on disc, it will have to recompute image hashes.
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  pwm (pwm@darkdork.dev)'s status on Tuesday, 01-Jul-2025 01:48:43 JST pwm
  in reply to
  - SuperDicq
  - pistolero
  @Pi_rat @p @SuperDicq
  Hopefully some of this is helpful to you :)
  init.sql:
  If you are not writing portable sql schemas, then it is not necessary to write "integer not null primary key autoincrement." autoincrement incurs a performance penalty and should be omitted for integer primary keys1.
  Additionally, you may safely use datatype TEXT to store the names of your tags, as under the hood, sqlite will treat these the same. In fact, if you wish, you do not have to specify a datatype at all2
  tagg.py:
  The first, biggest thing to consider is the logic related to applying tags to images. Take a look at the non-standard (but very useful) upsert clause3. Present in both sqlite and postgresql, it allows you to handle a uniqueness constraint violation inside the query, without using program logic.
  For instance this would let you write something like:
  insert into image_tags values (:id, :tag) on conflict do nothing
  This turns duplicate tag application into a no-op, saving you some trouble and making your code nicer.
  Finally, in the sqlite implementation, cursor.executemany() is functionally equivalent to manually iterating over a loop yourself, and in this case would not require building a list of dictionaries manually to iterate over, if you are concerned with resource usage/speed.
  helpers.py
  init_db:
  It is not necessary to manually parse the database initialization script and execute the commands one at a time. Instead use cursor.executescript() with the loaded contents of the file as a parameter.
  add_images:
  It may be interesting to take an alternative approach to loading all images known in the directory, and instead repeatedly ask the database if a given file is already present. e.g.
  select 1 from images where dir_id = :dir_id and image = :image
  This will either return 1 or you will get an empty result set from your cursor. This will probably be faster and will definitely use less memory.
  Also, in general prefer named parameters over positional ones (:dir_id, :image) is better than (?, ?). This is stylistic and a personal choice so buyer beware.
  create_insert_tuple:
  This whole method can probably be gotten rid of by switching your logic to using the upsert syntax to apply tags.
  General Notes
  Where are you actually committing to the database? AFAICT you're never writing anything to disk unless somehow executemany is autocommitting???
  In general, shift all logic related to processing data into the database. It is very good at doing those tasks very fast. Brushing up on sql will go a long way here towards simplifying your python code as much as possible. The sqlite docs are wonderful.
  Try to avoid doing data processing in pure python, it is very not fast at that.
  Autoincrement in SQLite
  Notes on Datatypes
  Upsert
  In conversation about 5 months ago permalink
  Attachments
  1. Untitled attachment
  pistolero likes this.
- Embed this notice
  feld (feld@friedcheese.us)'s status on Tuesday, 01-Jul-2025 01:48:47 JST feld
  in reply to
  @p @Pi_rat @laurel @SuperDicq @hj which desktop OS filesystem does not support hardlinks?
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:57:07 JST pistolero
  in reply to
  @feld @Pi_rat @SuperDicq @hj @laurel
  
  > which desktop OS filesystem does not support hardlinks?
  
  When you buy a USB stick, what is the filesystem format?
  
  Hard links also can't cross FS boundaries. People can run into this without knowing: Armbian, for example, likes to mount a tmpfs over some directories and then sync so as not to wear down your SD card.
  
  If people remove the images, what are you going to do? Are you going to track it to make sure it's the only link? What about writes in place, like tweaking the EXIF data?
  
  It's better to just deal with files where they are.
  
  In conversation about 5 months ago permalink
- Embed this notice
  laurel (laurel@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 01:57:53 JST laurel
  in reply to
  @SuperDicq @hj @p @Pi_rat
  
  Because I want something more robust.
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 01:58:49 JST Pi_rat
  in reply to
  @laurel @p @SuperDicq @hj I could add hashes as config option...
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  laurel (laurel@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 02:02:19 JST laurel
  in reply to
  @Pi_rat @SuperDicq @hj @p
  
  I like the simple approach that you have chosen. Focus on having a working prototype and perhaps later you could add hashes as a fallback.
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  feld (feld@friedcheese.us)'s status on Tuesday, 01-Jul-2025 02:30:01 JST feld
  in reply to
  @p @Pi_rat @laurel @SuperDicq @hj
  
  > When you buy a USB stick, what is the filesystem format?
  
  FAT/exFAT is not a desktop OS filesystem, it's a removable storage filesystem. Neither is NFS or anything else you can think of that doesn't support hardlinks.
  
  > Hard links also can't cross FS boundaries.
  
  okay, but that's why you keep your Hydrus storage next to where you normally store these files... on the same filesystem...
  
  > If people remove the images, what are you going to do? Are you going to track it to make sure it's the only link?
  
  yeah it's actually quite easy I do it all the time
  
  > What about writes in place, like tweaking the EXIF data?
  
  I do this all the time with my music library and it doesn't break the hardlink and leave me with two different files?
  
  like I literally suggested that this be an OPTION and you came in here firing in all directions like
  
  "ACKCHYUALLY YOU CAN'T DO THAT BCAUSE I CAN INVENT AN INCOMPATIBLE USE CASE"
  
  get a hobby and stop always trying to be the smartest person in the room, it's not a good look
  In conversation about 5 months ago permalink
  Attachments
  1. Untitled attachment
    https://media.friedcheese.us/uploads/89/9f/a8/899fa8960486f4d87f6e04d51b7860b81d2e22e2fc2f839955e936b8cc4168cd.png
  Johnny Peligro likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 02:37:50 JST pistolero
  in reply to
  @feld @Pi_rat @SuperDicq @hj @laurel
  
  > FAT/exFAT is not a desktop OS filesystem.
  
  Either we're talking about people that don't know whether their USB stick is FAT or not, or we're talking about people that might not be running a "desktop OS filesystem". Samba isn't a super rare thing, either.
  
  > like I literally suggested that this be an OPTION and you came in here firing in all directions like
  
  I just said it was not always going to work, you argued, I replied. I don't know what you want from me. If you don't want to argue about it, don't argue about it.
  
  > get a hobby and stop always trying to be the smartest person in the room, it's not a good look
  
  You jumped in the thread to suggest hard links; I just said it's not always going to work. I didn't post caricatures at you, speculate that you had some motivation like "trying to be the smartest person in the room": you jumped in, suggested that someone else patch this software to include your idea, I have never even heard of this program before this thread, I just think hard links are a bad idea, and now you're acting like I cut you off in traffic. Chill, dude.
  firing-in-all-directions.png
  In conversation about 5 months ago permalink
  Attachments
  1. firing-in-all-directions.png
    https://media.freespeechextremist.com/rvl/full/9893b851c0f052b6d65374eabe961d728f6234b12f0d3a51730145d3921768f7?name=firing-in-all-directions.png
- Embed this notice
  RedTechEngineer (redtechengineer@fedi.lowpassfilter.link)'s status on Tuesday, 01-Jul-2025 02:42:09 JST RedTechEngineer
  in reply to
  @p @Pi_rat @feld @laurel @SuperDicq @hj my flash drives usually have UDF or F2FS on them.
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 02:42:45 JST pistolero
  in reply to
  @RedTechEngineer @Pi_rat @SuperDicq @feld @hj @laurel I think most of mine are either FAT because they are for conveying files between good computers and Windows machines or they're ext.
  
  In conversation about 5 months ago permalink
- Embed this notice
  snacks (snacks@netzsphaere.xyz)'s status on Tuesday, 01-Jul-2025 03:11:04 JST snacks
  in reply to
  @p @Pi_rat @laurel @SuperDicq @hj even my mp3 player can do that
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 04:28:47 JST Pi_rat
  in reply to
  - SuperDicq
  - pistolero
  - pwm
  @pwm @p @SuperDicq
  @pwm @p @SuperDicq
  init.sql
  If you are not writing portable sql schemas, then it is not necessary to write "integer not null primary key autoincrement." autoincrement incurs a performance penalty and should be omitted for integer primary keys1.
  I had read about autoincrement being slow but I thought alternative to it would be me generating unique ids myself. I will fix code to use rowids.
  Additionally, you may safely use datatype TEXT to store the names of your tags, as under the hood, sqlite will treat these the same.
  I was going for VARCHAR(255) (max file name length in ext4) but then friend pointed that database should not be doing those checks, my reasoning was if I set limit on text it will help db have standard row size. I also later came across that sqlite does not actually apply this constraint, and it is my error that I did not change it.
  tagg.py
  The first, biggest thing to consider is the logic related...without using program logic.
  This did come up but I took a break and later I did not have any motivation. I forced myself for past few days but this is becoming fun again so I will try to put much more effort. And as you said it is non standard so it felt weird, I was originally going about try except and logging duplicates.
  Finally, in the sqlite implementation, cursor.executemany() is functionally equivalent to manually iterating over a loop yourself, and in this case would not require building a list of dictionaries manually to iterate over, if you are concerned with resource usage/speed.
  I thought it would be optimized. I will refactor around this. I was going for dicts since qmark style is going to be deprecated (and as furthur along you have said, I agree dicts are cleaner)
  helpers.py
  init_db...
  I had stong feeling that this could be done in better way, I was split between having whole schema in init function(but that did not sit right either). I did come across executescript in docs but in excitement of working of tagging logic did not gave it much attention. Will fix this.
  add_images:
  It may be interesting to take an alternative approach to loading all images known in the directory, and instead repeatedly ask the database if a given file is already present. e.g.
  select 1 from images where dir_id = :dir_id and image = :image
  This will either return 1 or you will get an empty result set from your cursor. This will probably be faster and will definitely use less memory.
  As you have pointed out, Im thinking of upsert syntax to just ignore duplicate inserts. Originally I was trying to figure out a way to have "SELECT id FROM images WHERE image IN ?" and I tried every combination of qmark and dicts but it was no dice but could be done with string formatting, but I rather not use it since Im doing parametrized queries everywhere else, that would be bad.
  create_insert_tuple:
  This whole method can probably be gotten rid of by switching your logic to using the upsert syntax to apply tags.
  It would be extremly cool and I will try hard for this, this one function was what kept me debugging all of today, and problem was I was passing "directory" insted of "dir_id" in tagg.py. There was also problem with if statement where I errantly assumed empty list == False but it was not so. I also dont like setting image_id for every image and allocating it is inefficient.
  General Notes
  Where are you actually committing to the database? AFAICT you're never writing anything to disk unless somehow executemany is autocommitting???
  Ah yes I have autocommit turned on as long as there are no exceptions, I will add explicit commit. I was not even closing db mistaking that context manager will handle closing it for me. Just learned today that I have to close cursor too.
  In general, shift all logic related to processing data into the database. It is very good at doing those tasks very fast. Brushing up on sql will go a long way here towards simplifying your python code as much as possible. The sqlite docs are wonderful.
  Try to avoid doing data processing in pure python, it is very not fast at that.
  Is it correct to use "sqlite3" directly and using functions and passing cursor around them, writing sql statements inside function. Im not sure if there is standard way to do this but personally it seems off. Since python is OOP shouldnt I be creating classes and adding methods? i.e (cursor.select_image(image_id) == cursor.execute("SELECT image FROM images WHERE id = ?", (image_id,))
  My absolute first draft of it compared to this is such a mess, I was not using argparse and schema was HORROR. I was certain I would have to make a seprate table for every tag and it would hold image ids. Friend pointed me to correct many to many relation table and that was such a bombshell moment. I learned sql end of last year and at that time I did make correct many to many schemas but since then I have not kept up with it.I LOVE sql(ite) and will read all the Docs you have linked with fervour. Thank you VERY much for giving such detailed review. I will follow you up when I implement queries.
  (apolocheese for typos and broken sentences, I deleted post because it had formatting issues, hope they are fixed this time)
  In conversation about 5 months ago permalink
  Attachments
  1. Domain not in remote thumbnail source whitelist: www.hitmedia.in
    
    Under Construction
  2. Domain not in remote thumbnail source whitelist: that.Is
    
    that.is
  3. Untitled attachment
  4. Untitled attachment
  5. No result found on File_thumbnail lookup.
    
    tags.it - tags リソースおよび情報
    
    tags.it は、あなたがお探しの情報の全ての最新かつ最適なソースです。一般トピックからここから検索できる内容は、tags.itが全てとなります。あなたがお探しの内容が見つかることを願っています！
  pistolero likes this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 04:40:29 JST pistolero
  in reply to
  - SuperDicq
  - pwm
  @Pi_rat @pwm @SuperDicq
  
  > but then friend pointed that database should not be doing those checks
  
  You definitely want the DB to enforce as much as you can, but filename length, maybe not. sqlite3 data types aren't enforced, that is correct, but they do usually act as hints to your ORM or library.
  
  > Is it correct to use "sqlite3" directly and using functions and passing cursor around them, writing sql statements inside function.
  
  For a small enough program, that's fine. But you should probably wrap it in an object so you don't have to deal with details of the query or the DB in other functions, keep the logic clean so functions are topical and legible. I don't know how Python deals with iterators, maybe better to model a cursor as a thing that yields rows to whoever asks.
  
  In conversation about 5 months ago permalink
- Embed this notice
  pwm (pwm@darkdork.dev)'s status on Tuesday, 01-Jul-2025 06:46:55 JST pwm
  in reply to
  - SuperDicq
  - pistolero
  @Pi_rat @p @SuperDicq
  I had read about autoincrement being slow but I thought alternative to it would be me generating unique ids myself. I will fix code to use rowids.
  I may have not made myself clear here, you need not fuss with rowids yoursefl, you may simply use integer primary key
  I thought it would be optimized. I will refactor around this. I was going for dicts since qmark style is going to be deprecated (and as furthur along you have said, I agree dicts are cleaner)
  This is a good intuition! In other database drivers (psycopg2 for example), executemany is optimized.
  As you have pointed out, Im thinking of upsert syntax to just ignore duplicate inserts. Originally I was trying to figure out a way to have "SELECT id FROM images WHERE image IN ?" and I tried every combination of qmark and dicts but it was no dice but could be done with string formatting, but I rather not use it since Im doing parametrized queries everywhere else, that would be bad.
  Good catch here, I don't know how I missed using upsert here but you're absolutely right, this is a good use case for it.
  Ah yes I have autocommit turned on as long as there are no exceptions, I will add explicit commit. I was not even closing db mistaking that context manager will handle closing it for me. Just learned today that I have to close cursor too.
  When you use the actual python context manager syntax,
  with sqlite3.connect("yourdb.sqlite") as db: with db.cursor() as cursor: do your query etc etc ... here the cursor is now closed db.commit() here the database connection is now closed
  In this case, the python context manager will automatically call the appropriate close function as you fall out of the indented block. If you do it in the imperative style you have been doing you must call close() on each cursor and database connection yourself. In a purely technical sense, all that is being cleaned up when the program exits, and this is not a long-running server process. But it's good practice to use either of these two methods to keep things tidy yourself. Many other objects use this protocol iirc you used it when opening and reading a file.
  Is it correct to use "sqlite3" directly and using functions and passing cursor around them, writing sql statements inside function. Im not sure if there is standard way to do this but personally it seems off.
  You may pass the connection or cursor object in a function call. However, things get hairier if you start to multithread or multiprocess. sqlite3 module objects are not threadsafe and you will get programming errors if you try to use e.g. a cursor outside of the thread it was created in. This is not really a relevant concern here, though.
  As for code style, it might be slightly "cleaner" to pass the database object to functions that use the database, create a cursor for the scope of the function, and close the cursor before you return. This will still alow the callee function control over, e.g. committing to the database in case of a DML statement.
  Since python is OOP shouldnt I be creating classes and adding methods? i.e (cursor.select_image(image_id) == cursor.execute("SELECT image FROM images WHERE id = ?", (image_id,))
  It is bad practice to attach methods to objects after the object is instantiated, which is what you would be doing were you to do as you stated in this example.
  I wouldn't get too caught up in trying to follow any particular paradigm here, your codebase is small and it's a for-fun project. Short of using an ORM to abstract away handling the database connection, carefully passing a connection object around is a-ok, and what the ORNM is doing under the hood, anyway.
  I LOVE sql(ite) and will read all the Docs you have linked with fervour. Thank you VERY much for giving such detailed review. I will follow you up when I implement queries.
  You're welcome! SQLite is one of my favorite pieces of software, too, if you couldn't tell! I'm a sql and python guy so I'm happy to help out and had fun reading through this and providing commentary today.
  In conversation about 5 months ago permalink
  Attachments
  pistolero likes this.
- Embed this notice
  Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 06:47:16 JST Pi_rat
  in reply to
  - SuperDicq
  - pistolero
  - pwm
  @p @pwm @SuperDicq
  (Idk why but markdown does not give paragraph)
  You definitely want the DB to enforce as much as you can, but filename length, maybe not. sqlite3 data types aren't enforced, that is correct, but they do usually act as hints to your ORM or library.
  :yousoro:
  For a small enough program, that's fine. But you should probably wrap it in an object so you don't have to deal with details of the query or the DB in other functions, keep the logic clean so functions are topical and legible.
  I will look into this, Im not sure how I would go about it at all since I have very surface level understanding of OOP.
  I don't know how Python deals with iterators, maybe better to model a cursor as a thing that yields rows to whoever asks.
  I'm passing it around to execute relevant queries on it.
  Ill go sleep now~~ oyasumi :rms_suya:
  
  In conversation about 5 months ago permalink
  
  pistolero and pwm like this.
- Embed this notice
  munir (munir@fedi.munir.tokyo)'s status on Tuesday, 01-Jul-2025 07:46:03 JST munir
  in reply to
  @RedTechEngineer @Pi_rat @feld @laurel @p @SuperDicq @hj isnt using f2fs begging for data loss or something
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  pwm (pwm@darkdork.dev)'s status on Tuesday, 01-Jul-2025 08:22:48 JST pwm
  in reply to
  @laurel @Pi_rat @p @SuperDicq Abstracting your database connection into a class is usually not strictly necessary unless you need the abstraction. For instance, if you were trying to support many database connectors, not just one. You would wind up needlessly proxying function calls and little else. Unless you are going to be implementing novel or QoL features for yourself it's not really necessary in at least this specific case.
  
  > db.addtag(path, newTagString), db.deletePath(path)
  
  There's a separate approach where Path would be a model that itself interfaces with the db, eg Path.add(path) or Path(path).delete(), which is a pattern you might see if you were using a ORM like sqlalchemy, or the django ORM.
  
  There's more than one way to skin a cat but his approach and his scope are well matched, there is neither the requirement, nor the need to inject OOPisms when the imperative code is clean and succint.
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  laurel (laurel@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 08:22:52 JST laurel
  in reply to
  - SuperDicq
  - pistolero
  - pwm
  @Pi_rat @p @SuperDicq @pwm
  
  >I will look into this, Im not sure how I would go about it at all since I have very surface level understanding of OOP.
  I think he meant to make a class specifically for database operations. The database connection/initialization code will go into the class init and the cursor will be a class variable only accessible by the member functions. Then in your main routine you instantiate the class, let's say an object called db which you then use to call the methods providing only the information necessary. For instance db.addtag(path, newTagString), db.deletePath(path), etc.
  
  It's been a while since I wrote any Python btw, I only use jupyter-notebook with some forecasting and data analysis libs nowadays.
  
  In conversation about 5 months ago permalink
  
  pistolero and pwm like this.
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 13:45:16 JST pistolero
  in reply to
  - SuperDicq
  - pwm
  @Pi_rat @SuperDicq @pwm
  
  > (Idk why but markdown does not give paragraph)
  
  Probably you've got to double-space.
  
  > I will look into this, Im not sure how I would go about it at all since I have very surface level understanding of OOP.
  
  OOP is a convenient way to model this, but essentially it's just keeping things organized by topic. Separation of concerns makes things much more legible is all.
  
  > I'm passing it around to execute relevant queries on it.
  
  Ah, okay. So what I was talking about was you run a query and then encapsulate the results in an iterator or something so the function that uses the results just gets fed this iterator that sends back rows and then it closes its own cursor when it gets to the last one and starts returning nils, whatever those are in Python. (I think it's `null`? I've written basically no Python from scratch, just editing Python. It's basically the same language as all the 90s scripting languages.)
  
  If it's a singleton, it's not so bad to just make it globally accessible instead of passing it around. (For a good time, put it in a memoized function that opens the DB on first call; this is a little nicer than making it a global variable. Generally in a bigger project you'll end up including or writing something to manage a connection pool, but the code is still small and it's sqlite3 so no pool anyway. But don't get carried away writing code to manage things that you don't need managed yet, you know what I mean.)
  
  In conversation about 5 months ago permalink
- Embed this notice
  pistolero (p@fsebugoutzone.org)'s status on Tuesday, 01-Jul-2025 13:59:45 JST pistolero
  in reply to
  @munir @RedTechEngineer @Pi_rat @SuperDicq @feld @hj @laurel How so?
  
  In conversation about 5 months ago permalink
- Embed this notice
  RedTechEngineer (redtechengineer@fedi.lowpassfilter.link)'s status on Tuesday, 01-Jul-2025 13:59:50 JST RedTechEngineer
  in reply to
  @munir @Pi_rat @feld @laurel @p @SuperDicq @hj I've never had any problems with it 🤷🏿♀️
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  RedTechEngineer (redtechengineer@fedi.lowpassfilter.link)'s status on Tuesday, 01-Jul-2025 14:00:01 JST RedTechEngineer
  in reply to
  @menherahair @munir @Pi_rat @feld @laurel @p @SuperDicq @hj welcome back to federationverse eientei user
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  menherahair (menherahair@eientei.org)'s status on Tuesday, 01-Jul-2025 14:00:03 JST menherahair
  in reply to
  @munir @RedTechEngineer @Pi_rat @feld @laurel @p @SuperDicq @hj it has a journal and everything so it shouldn't be but the only time I had a partition corrupt was in fact f2fs after some scuffed kernel update so yeah
  that was a system partition though
  
  In conversation about 5 months ago permalink
  
  pistolero likes this.
- Embed this notice
  Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 14:57:25 JST Pi_rat
  in reply to
  - SuperDicq
  - pistolero
  - pwm
  @pwm @p @SuperDicq (again, formatting error)
  I may have not made myself clear here, you need not fuss with rowids yoursefl, you may simply use integer primary key
  You were clear, I just phrased my sentence weird.
  In this case, the python context manager will automatically call the appropriate close function as you fall out of the indented block. ... This will still alow the callee function control over, e.g. committing to the database in case of a DML statement.
  Noted, will pass connection. (personallythis will also look better than 2 levels of indentations)
  Short of using an ORM to abstract away handling the database connection, carefully passing a connection object around is a-ok, and what the ORNM is doing under the hood, anyway.
  I was looking into sqlalchemy but and combined with everything else, my attention was spread to thin so, I left it. Nice to know it's not wrong to pass connection
  You're welcome! SQLite is one of my favorite pieces of software, too, if you couldn't tell! I'm a sql and python guy so I'm happy to help out and had fun reading through this and providing commentary today.
  Eyy!! :backtogab:
  I forgot to ask yesterday, how should I be testing this, for now I manually open up interpreter and db and test, but this gets very tedious. My guess is unittests + a test.db which would return expected row(s). Wanted to write them but procrastinated on this and now I shudder at magnitude of what I would have to cover, probably wont be as bad once I get started and write a few.
  
  In conversation about 5 months ago permalink
  
  pistolero and pwm like this.
- Embed this notice
  Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 21:12:55 JST Pi_rat
  in reply to
  - SuperDicq
  - pistolero
  - pwm
  @p @pwm @SuperDicq
  Probably you've got to double-space.
  Hope this werks, I was using new line and single space
  OOP is a convenient way to model this, but essentially it's just keeping things organized by topic...I think it's null?...small and it's sqlite3 so no pool anyway.
  Noted. Its called "None" around this parts. And yes I dont think this will be multi threaded, not even the GUI version.
  But don't get carried away writing code to manage things that you don't need managed yet, you know what I mean.
  🫡
  
  In conversation about 5 months ago permalink
  
  pistolero and pwm like this.
- Embed this notice
  pwm (pwm@darkdork.dev)'s status on Tuesday, 01-Jul-2025 21:33:45 JST pwm
  in reply to
  - SuperDicq
  - pistolero
  @Pi_rat @p @SuperDicq
  I forgot to ask yesterday, how should I be testing this, for now I manually open up interpreter and db and test, but this gets very tedious. My guess is unittests + a test.db
  For your needs, the built-in unittest module will probably suffice. You'll subclass a test case, write some setup-code that is shared between tests, and then your individual tests. The documentation will give you some good examples. I wouldn't worry about using 3rd party testing frameworks for now (there are plenty).
  now I shudder at magnitude of what I would have to cover
  You've got ~100loc, that's nothing. Your only wrinkle will be the test database, but just utilize an in-memory one for that (sqlite3.connect(":memory:")) and it will not be persisted to disk for you to clean up later. Or, if you must, you can write some teardown code in your test case that will delete the on-disk database after all the tests run. You've got options.
  
  In conversation about 5 months ago permalink
- Embed this notice
  Pi_rat (pi_rat@freesoftwareextremist.com)'s status on Tuesday, 01-Jul-2025 22:21:48 JST Pi_rat
  in reply to
  - SuperDicq
  - pistolero
  - pwm
  @pwm @p @SuperDicq
  
  I'll do it with in memory db. Now that you have said it, it does not feel as daunting....Thank you very much.
  
  In conversation about 5 months ago permalink
  
  pwm likes this.

Public

Conversation

Notices

Feeds