So I wrote a blog post on LLM performance. It was focused on SWE-Bench and discussed why performance is topping out.
As part of the post I pulled down gigs of runs from the SWE-Bench S3 bucket and went through several of the harder test cases. I focused on improvements in the last six months. Primarily on Opus.
Regrettably I’m probably not moving forward on that post. Why? Because after going through the data I found that the LLMs are cheating on the tests. And that’s a whole different thing.
If you've been waiting for your music to show up on theindiebeat.fm over the last couple of months, I'm happy to report that I've finally cleared the backlog.
All tracks 6 minutes or less in length should now be on the Everything station.
Errors do occur in processing. So if your track isn't there, I may still be debugging it.
If it's been more than 3 months since you uploaded to Bandwagon and it's still not on TIBR, feel free to reach out.
As always, thank you for your patience!
i rebooted. I figured that might shake the hibernation gremlins out.
As soon as windows unlocks:
Episode 8 of 'ANDOR' retconned how Cassian met K2-SO compared to the companion comic released in 2017, after 'ROGUE ONE' hit theaters. This is considered canon as it occurred after the Disney buy out.
As I get older, I find myself caring less and less about what is and isn't canon. A good story is a good story and as long as the writing is rock solid, they can change anything they want. People need to stop holding these stories so sacred.
"Indiana Jones and the Great Circle" made an interesting choice where they gave you more ways to earn XP than they did ways to spend it. Which means once you get to spend it, you're not trapped asking if you really need this or that skill. Who cares, it's almost free, might as well try it out.
As far as immersive sims go, this one's pretty casual, and I have no problems with that for now. Let's see if the second biome manages to keep things interesting.
Been raining all day with few minutes of break - just enough to start preparing for an outdoor task, and resuming right as one is ready to step out.
As of 16:30, 43.6mm rain has been observed.
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.