Conversation

Notices

Embed this notice
kaoudis (kaoudis@infosec.exchange)'s status on Wednesday, 18-Sep-2024 17:56:04 JST kaoudis

If you’re generating code, and you’re *not* doing it with an LLM, is it reasonable to use metrics like F1 and recall to measure how well the tools you use are doing? This is bothering me because it feels a bit weird to apply metrics like this to static analyses, build tooling frameworks, or things that just plain don’t have any recall to begin with.

In conversation about 8 months ago from infosec.exchange permalink
- Embed this notice
  Ryan Castellucci :nonbinary_flag: (ryanc@infosec.exchange)'s status on Wednesday, 18-Sep-2024 17:56:04 JST Ryan Castellucci :nonbinary_flag:
  in reply to
  
  @kaoudis generating code, like, with build time scripts?
  
  In conversation about 8 months ago permalink
- Embed this notice
  kaoudis (kaoudis@infosec.exchange)'s status on Wednesday, 18-Sep-2024 18:05:29 JST kaoudis
  in reply to
  - Ryan Castellucci :nonbinary_flag:
  @ryanc yeah, if you want a bunch of semi-reasonable test cases for a compiler or something and you generate a bunch of build variants, is the case I was thinking about
  
  In conversation about 8 months ago permalink
- Embed this notice
  kaoudis (kaoudis@infosec.exchange)'s status on Wednesday, 18-Sep-2024 18:24:15 JST kaoudis
  in reply to
  - Ryan Castellucci :nonbinary_flag:
  @ryanc the thing in question is a talk I’m watching about using LLMs to figure out if code variants are equivalent, and as their baseline they seem to have used precision, recall, and F1 to measure how well methods that leverage non-ML things do at determining when code variants are equivalent
  
  In conversation about 8 months ago permalink
- Embed this notice
  Ryan Castellucci :nonbinary_flag: (ryanc@infosec.exchange)'s status on Wednesday, 18-Sep-2024 18:24:15 JST Ryan Castellucci :nonbinary_flag:
  in reply to
  
  @kaoudis I have a number of personal projects that use build time code generation, some of it parameterized. Not sure if it would be useful for you to look at. The most complicated is a cryptographic hash library that generates HMAC, PBKDF2, and HKDF functions. I validate via test vectors.
  
  In conversation about 8 months ago permalink

Public

Notices

Feeds