Notices by Mark Gritter (markgritter@mathstodon.xyz)

Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Sunday, 15-Jun-2025 12:20:45 JST Mark Gritter

We could just ... not give police tear gas. That is a decision we could make. A state or city could make that decision, and remove it from inventory.
Maybe police do not actually need area-denial weaponry present at every public gathering. Maybe "people are throwing bottles at us, therefore tear gas" need not be the standard playbook.
The medical research on the _safety_ of CS gas suggests a spectrum from "unknown" to "probably dangerous, particularly the way it actually used": https://www.propublica.org/article/tear-gas-is-way-more-dangerous-than-police-let-on-especially-during-the-coronavirus-pandemic https://med.umn.edu/news/u-m-study-shows-little-research-available-long-term-effects-tear-gas-use
I had less luck finding studies about its _effectiveness_ of CS gas for the presumed goals of police forces: officer safety and encouraging compliance with dispersal order. I found a lot of quotes from police along the lines of "well, the only alternative is bashing people with batons" One study of the introduction of CS spray in UK police forces (for personal protection, not crowd control) did not find a clear win: https://doi.org/10.1108/13639510010343065
Anecdotally, we might look at the current situation at LA, or that a few years ago in Minneapolis, and note that CS gas was not effective in reducing crowd antagonism, even if it does move people from one place to another.
In conversation about 10 days ago from mathstodon.xyz permalink
Attachments
1. Domain not in remote thumbnail source whitelist: img.assets-c3.propublica.org
  
  Tear Gas Is Way More Dangerous Than Police Let On — Especially During the Coronavirus Pandemic
  
  from @propublica
  
  In the middle of a respiratory pandemic, law enforcement agencies have used tear gas in especially dangerous ways. The chemical agent also seeps into homes, contaminates food, furniture, skin and surfaces, and can cause long-term lung damage.
2. Untitled attachment
3. No result found on File_thumbnail lookup.
  
  Arming a traditionally disarmed police: an examination of police use of CS gas in the UK
  
  The introduction of police use of CS gas within the UK has prompted widespread criticism. This article begins with the background to the introduction of CS gas, including the rationale behind its use. This is followed by an elucidation of the concerns and problems ensuing from its use, namely danger to health, police use/misuse, its effectiveness as a deterrent to police assaults, and police accountability. Throughout the article a number of recent cases are discussed.
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Monday, 28-Apr-2025 11:01:13 JST Mark Gritter

This is yet another case of an IRB basically deciding that people on the internet aren't real: https://www.reddit.com/r/changemyview/comments/1k8b2hj/meta_unauthorized_experiment_on_cmv_involving/
I am reminded of the University of Minnesota experiment a few years ago that sent deliberately bad patches to the Linux kernel, and the IRB decided this wasn't "human experimentation" so no consent was necessary: https://www.theverge.com/2021/4/30/22410164/linux-kernel-university-of-minnesota-banned-open-source
This should be obvious to everybody in the year 2025, but if it would be unethical to involuntarily subject a visitor to your campus to an experiment, then it is also unethical to do it on a message board or email list. You cannot enroll subjects in interventional studies without their consent. This should be absolutely clear to everyone on the IRB. But because you put the Internet in the middle, somehow they end up deciding it's OK after all.
You also should not perform experiments that violate the rules of the space you're operating in! That's also an ethical principle. These boundaries are also not less real because they are on the Internet.
In conversation about 2 months ago from mathstodon.xyz permalink
Attachments
1. Untitled attachment
2. Untitled attachment
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Monday, 28-Apr-2025 02:56:21 JST Mark Gritter

LLM attacks frequently claim to access the "system prompt" (and I can't believe we're still building the future on a technology without use/mention distinction.)
But my question has always been: how do you know? The LLM produced something that is plausibly a system prompt. But LLMs are good at producing plausible text!
All it takes would be a little bit of methodology -- is the output always the same? Is it the same for different attack vectors? Or did you get something merely system-prompt shaped? If you just ask the LLM to write a generic system prompt, how close would you get?
(I'm particularly skeptical because the blog post is selling a technology that would "fix" this.)
https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/
via https://circumstances.run/@davidgerard/114407046617549385
In conversation about 2 months ago from mathstodon.xyz permalink
Attachments
1. Untitled attachment
2. No result found on File_thumbnail lookup.
  
  David Gerard (@davidgerard@circumstances.run)
  
  from David Gerard
  
  yet again, you can bypass LLM “prompt security” with a fanfiction attack https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/ not Pivoting cos (1) the fanfic attack is implicit in building an uncensored compressed text repo, then trying to filter output after the fact (2) it’s an ad for them claiming they can protect against fanfic attacks, and I don’t believe them
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Friday, 17-Jan-2025 16:57:16 JST Mark Gritter
in reply to
- Luke T. Shumaker
@lukeshu Thanks.
I guess what I need to do is figure out some time in the last couple hundred years that has the same alignment as the actual zero time (using Truncate on the largest interval I care about.)

In conversation about 5 months ago from mathstodon.xyz permalink
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Friday, 17-Jan-2025 15:01:30 JST Mark Gritter

Go's zero time is year 1, not 1970 or something Unix-y like that. That's fine.
Except:
func main() {
origin := time.Time{}
t := time.Date(2025, 1, 16, 4, 0, 0, 0, time.UTC)
d := t.Sub(origin)
t2 := origin.Add(d)
fmt.Println(t2)
}
results in 0293-04-11 23:47:16.854775807 +0000 UTC
https://go.dev/play/p/turgQXfyJG-
There's obviously an integer overflow happening here and it's too late in the day for me to figure out how to work around it.
What I need to do is truncate times to a 64-hour or 256-hour boundary. Our existing Go code truncates relative to Go's zero time. TimescaleDB truncates relative to 2001-01-01, or 2001-01-03 in some cases when calculating buckets. It seems challenging to write code that handles both, if the invariant t.Sub(origin).Add(origin) == t does not hold.
#golang

In conversation about 5 months ago from mathstodon.xyz permalink
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Saturday, 11-Jan-2025 17:32:46 JST Mark Gritter

A #ComputingHistory question that came up today: what is the origin of | (the vertical stroke) as bitwise OR in PL/I, and thence to the C family of languages?
I haven't been able to trace it further back, and interestingly in logic it was the "Sheffer Stroke", NAND (although Wikipedia claims that Sheffer actually used it for NOR instead?) There does not seem to be a logic or typesetting convention that birthed |.
I don't know enough about early IBM keyboards to know what other characters might be available.
The choice of & for AND instead of ^ -- it's right there! -- is similarly unclear.

In conversation about 6 months ago from mathstodon.xyz permalink
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Tuesday, 07-Jan-2025 14:05:16 JST Mark Gritter
in reply to
- Paul Cantrell
@inthehands OK, you are a person on the Internet (and also one I've met in real life) and I kind of want to argue with you whether or not you accomplished the challenge. :)
But the tragedy here is that I have felt the same way about LLMs _even though_ I know that it is futile. Once you are chatting in a textbox some sort of magic takes over where we ascribe intentionality.

In conversation about 6 months ago from mathstodon.xyz permalink
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Tuesday, 07-Jan-2025 10:53:42 JST Mark Gritter

"Do not treat the generative AI as a rational being" Challenge
Rating: impossible
Asking a LLM bot to explain its reasoning, its creation, its training data, or even its prompt doesn't result in an output that means anything. LLMs do not have introspection. Getting the LLM to "admit" something embarrassing is not a win.

In conversation about 6 months ago from mathstodon.xyz permalink
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Thursday, 12-Dec-2024 11:16:28 JST Mark Gritter

"Anyhow, when a crypto founder couldn’t find a bank in 2011, one could be excused for blaming reflexive banker conservatism and low levels of technical understanding. Crypto has had a decade and a half to develop a track record to be judged on. Crypto is being judged on that track record."
https://www.bitsaboutmoney.com/archive/debanking-and-debunking/
An article on "debunking" explaining why it is sometimes Kafkaesque (you got an SAR but the bank is not allowed to tell you that so it institutionally forgets the fact as soon as possible) and sometimes "duh, the bank management can read the paper too" (banks have gotten badly burned by servicing crypto companies, and the profit in doing so is very low.)
In conversation about 7 months ago from mathstodon.xyz permalink
Attachments
1. Untitled attachment
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Sunday, 13-Oct-2024 02:18:03 JST Mark Gritter

I think this is very interesting but really very _basic_ research on the capabilities of LLMs at reasoning problems. Any random PhD should be able to move from a static benchmark of math problems to a distribution of similar problems. That's exactly what this paper does, and discovers:
1. All current models do worse at GSM8K-like problems than they do on GSM8K itself, and there is wide variation in success for different samples.
2. #LLM performance varies if you change the names. It changes even more if you change the numbers. Change both, and you get even more variation.
3. Adding more clauses to the word problems makes the models perform worse.
4. Adding irrelevant information to the word problems makes the models perform worse.
5. Even the latest o1-minit and o1-preview models, while they score highly, show the same sort of variability.
I think this is the bare minimum we should be expecting of "AI is showing reasoning behavior" claims: demonstrate on a distribution of novel problems instead of a fixed benchmark, and show the distribution instead of the best results.
It's not that humans don't share similar biases -- plenty of middle-school students are tripped up by irrelevant data too -- but I think results like this show we are very far off from any sort of expert-level LLMs. If they show wide distribution of behavior on tasks that are easy to measure, it's quite likely the same is true on tasks that are harder to measure.
https://arxiv.org/abs/2410.05229
In conversation about 9 months ago from mathstodon.xyz permalink
Attachments
1. No result found on File_thumbnail lookup.
  
  results.it
  
  This domain may be for sale!
2. For six models, graphs showing how the behavior on GMS-Symbolic compares with the standard GMS8K benchmark. The graphs show the distribution of outcomes when names are varied, when the numbers are varied, and when both are varied. Typically these three distributions increase in width and have decreasing averages. A notable counterexample is Llama3-8b-instruct, where all pretty much coincide (and are better than GSM8K) but the base accuracy is only 74%.
  https://media.mathstodon.xyz/media_attachments/files/113/295/496/240/567/803/original/f46cf2d72f5014f7.png
3. Four graphs showing similar results for the o1-mini and o1-preview models. A wide variation in performance on GSM-Symbolic vs. GSM8K, with o1-preview showing a substantially worse behavior on the synthetic benchmark. The other two graphs show adding or removing clauses from GSM-style problems.
  https://media.mathstodon.xyz/media_attachments/files/113/295/525/945/775/697/original/dee72d422645c0eb.png
4. Domain not in remote thumbnail source whitelist: arxiv.org
  
  GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
  
  Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics. To address these concerns, we conduct a large-scale study on several SOTA open and closed models. To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions. GSM-Symbolic enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of models.Our findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question. Specifically, the performance of all models declines when only the numerical values in the question are altered in the GSM-Symbolic benchmark. Furthermore, we investigate the fragility of mathematical reasoning in these models and show that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data. Adding a single clause that seems relevant to the question causes significant performance drops (up to 65%) across all state-of-the-art models, even though the clause doesn't contribute to the reasoning chain needed for the final answer. Overall, our work offers a more nuanced understanding of LLMs' capabilities and limitations in mathematical reasoning.
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Monday, 26-Aug-2024 10:31:35 JST Mark Gritter

From Max Kreminski at #bangbangcon: Humans are worse at ideation when they use ChatGPT, compared to Oblique Strategies.
The paper he coauthored: https://arxiv.org/abs/2402.01536
Projects he talked about:
Blabrecs: beat the classifier at making up nonsense English-y words. https://mkremins.github.io/blabrecs/
Blabwreckage: start with either a real poem or complete gibberish, then "wreck" it into something vaguely language-shaped. https://mkremins.github.io/blabwreckage/ (I don't think you can provide your own seed? At least, not without hacking the Javascript?)
Savelost: remove one letter at a time from a sentence, attempting to preserve meaning. (I'm curious how a human would do at this task in comparison.) https://barrettrees.com/savelost/
In conversation about 10 months ago from mathstodon.xyz permalink
Attachments
1. Untitled attachment
  https://media.mathstodon.xyz/media_attachments/files/113/025/657/027/543/520/original/e583a36a792ddb24.png
2. Untitled attachment
3. Untitled attachment
4. No result found on File_thumbnail lookup.
  
  Blabwreckage
5. Domain not in remote thumbnail source whitelist: barrettrees.com
  
  savelost
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Friday, 21-Jun-2024 02:49:00 JST Mark Gritter
in reply to
- Paul Cantrell
@inthehands So capitalism, but only on existing resources, and without any competition? Like, was he _trying_ to make the point that rent-seeking is bad? Capitalism But Everything is A Monopoly?
Maybe I'm making the unfair assumption that nobody could bring in extra chairs to compete and actually establish some sort of market, but I kind of doubt it.

In conversation about a year ago from mathstodon.xyz permalink
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Monday, 13-Nov-2023 09:50:23 JST Mark Gritter

I am an old now because I remember the last time "run compiled code in the browser" was a thing. But I'm not sure anybody else does, with all the takes about how running a virtual machine (and Java!) in the browser is such a great thing?
Is there a good historically-minded writeup about how #WebAsssembly differs from the last browser VM hype cycle?
In conversation Monday, 13-Nov-2023 09:50:23 JST from mathstodon.xyz permalink
Attachments
1. "How to run Java in the browser with Web assembly. There's an easy way to run Java in the browser now that We assembly is a W3C standard..."
  https://media.mathstodon.xyz/media_attachments/files/111/400/320/266/105/821/original/b99df3b49e61d72c.jpg
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Thursday, 18-May-2023 21:59:00 JST Mark Gritter

TIL that there are Dudes on Quora who feel the need to respond to questions about linear inequalities with anti-woke messaging.

In conversation Thursday, 18-May-2023 21:59:00 JST from mathstodon.xyz permalink
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Wednesday, 10-May-2023 02:29:11 JST Mark Gritter

I did not realize you could create a unicursal #maze out of an ordinary branching maze by placing a wall in the middle of each corridor!
https://twitter.com/aemkei/status/1388610729855553549
This makes sense in that it's sort of enforcing the "right-hand rule" by taking away all your choices, guiding you along the path that the RHR would take on the original maze -- and that path is deterministic and thus necessarily branch-free.

In conversation Wednesday, 10-May-2023 02:29:11 JST from mathstodon.xyz permalink
Embed this notice
Mark Gritter (markgritter@mathstodon.xyz)'s status on Wednesday, 03-May-2023 14:06:33 JST Mark Gritter
in reply to
- Paul Cantrell
@inthehands Very nice!
I'm frustrated by how long we've heard "use composition not inheritance" and yet language support for composition is still so poor. (I like what Swift does -- but my day-to-day language right now is Go and like everything else in Go the solution is "write a bunch of boilerplate so that the function names match up.")

In conversation Wednesday, 03-May-2023 14:06:33 JST from mathstodon.xyz permalink

Public

Notices by Mark Gritter (markgritter@mathstodon.xyz)

User actions

Following 0

Followers 0

Groups 0

Statistics

Feeds