Not enough time, too many projects. Useful projects I did over the weekend with Opus 4.6 and GPT 5.4 (just casually chatting with it).
2025 Taxes
Dumped all pdfs of all my tax forms into a single folder, asked Claude the rename them nicely. Ask it to use Gemini 2.5 Flash to extract out all tax-relevant details from all statements / tax forms. Had it put together a webui showing all income, deductions, etc, for the year. Had it estimate my 2025 tax refund / underpay.
Result was amazing. I now actually fully understand the tax position. It broke down all the progressive tax brackets, added notes for all the extra federal and state taxes (i.e. Medicare, CA Mental Health tax, etc).
Finally had Claude prepare all of my docs for upload to my accountant: FinCEN reporting, summary of all docs, etc.
Desk Fabrication
Planning on having a furniture maker fabricate a custom walnut solid desk for a custom office standing desk. Want to create a STEP of the exact cuts / bevels / countersinks / etc to help with fabrication.
Worked with Codex to plan out and then build an interactive in-browser 3D CAD experience. I can ask Codex to add some component (i.e. a grommet) and it will generate a parameterized B-rep geometry for that feature and then allow me to control the parameters live in the web UI.
Codex found Open CASCADE Technology (OCCT) B-rep modeling library, which has a web assembly compiled version, and integrated it.
Now have a WebGL view of the desk, can add various components, change their parameters, and see the impact live in 3D.
What scares me though is how I've (still) seen ChatGPT make up numbers in some specific scenarios.
I have a ChatGPT project with all of my bloodwork and a bunch of medical info from the past 10 years uploaded. I think it's more context than ChatGPT can handle at once. When I ask it basic things like "Compare how my lipids have trended over the past 2 years" it will sometimes make up numbers for tests, or it will mix up the dates on a certain data points.
It's usually very small errors that I don't notice until I really study what it's telling me.
And also the opposite problem: A couple days ago I thought I saw an error (when really ChatGPT was right). So I said "No, that number is wrong, find the error" and instead of pushing back and telling me the number was right, it admitted to the error (there was no error) and made up a reason why it was wrong.
Hallucinations have gotten way better compared to a couple years ago, but at least ChatGPT seems to still break down especially when it's overloaded with a ton of context, in my experience.
In my case, what I like to do is extract data into machine-readable format and then once the data is appropriately modeled, further actions can use programmatic means to analyze. As an example, I also used Claude Code on my taxes:
1. I keep all my accounts in accounting software (originally Wave, then beancount)
2. Because the machinery is all in programmatically queriable means, the data is not in token-space, only the schema and logic
I then use tax software to prep my professional and personal returns. The LLM acts as a validator, and ensures I've done my accounts right. I have `jmap` pull my mail via IMAP, my Mercury account via a read-only transactions-only token and then I let it compare against my beancount records to make sure I've accounted for things correctly.
For the most part, you want it to be handling very little arithmetic in token-space though the SOTA models can do it pretty flawlessly. I did notice that they would occasionally make arithmetic errors in numerical comparison, but when using them as an assistant you're not using them directly but as a hypothesis generator and a checker tool and if you ask it to write out the reasoning it's pretty damned good.
For me Opus 4.6 in Claude Code was remarkable for this use-case. These days, I just run `,cc accounts` and then look at the newly added accounts in fava and compare with Mercury. This is one of those tedious-to-enter trivial-to-verify use-cases that they excel at.
To be honest, I was fine using Wave, but without machine-access it's software that's dead to me.
For the tax thing. I had Claude write a CLI and a prompt for Gemini Flash 2.5 to do the structured extraction: i.e. .pdf -> JSON. The JSON schema was pretty flexible, and open to interpretation by Gemini, so it didn't produce 100% consistent JSON structures.
To then "aggregate" all of the json outputs, I had Claude look at the json outputs, and then iterate on a Python tool to programmatically do it. I saw it iterating a few times on this: write the most naive Python tool, run it, throws exception, rinse and repeat, until it was able to parse all the json files sensibly.
Yeah, in my user prompt I have "Whenever you are asked to perform any operation which could be done deterministically by a program, you should write a program to do it that way and feed it the data, rather than thinking through the problem on your own." It's worked wonders.
Yeah, asking for a tool to do a thing is almost always better than asking for the thing directly, I find. LLMs are kind of not there in terms of always being correct with large batches of data. And when you ask for a script, you can actually verify what's going on in there, without taking leaps of faith.
It's not good in some job negotiations if someone has a very clear picture of what your current net worth and income is. Also in some purchases companies could price discriminate more effectively against you.
Now that's a question I'd feel more confident having answered by an LLM. Personally, I'm tired of arguing with "nothing to hide", which (no offense) is just terribly naive these days.
I had ai hallucinate that you can use different container images at runtime for emr serverless. That was incorrect its only at application creation time.
I feel pretty productive myself with AI but this list isn’t beating the rap that AI boosters mostly use AI to do useless stuff focused on pretending to improve productivity or projects that make it easier to use AI.
Some of them definitely do not. Like a fictional encyclopedia? What is the point of that? That's like "an alphabetical novel".
And even for the ones that might "beat the rap", I don't understand from your descriptions why they are interesting or unique. A voice note recorder? Cool. There are already hundreds if not thousands of those, why did you need to make your own in the first place? I'm not saying that yours isn't special, I'm just saying that it doesn't help to post the blandest description possible if you're trying to impress people with the utility of your utility.
So not only does he have to show what he built with AI, what he built with AI has to be interesting and unique to you? Why? He's not selling it to you.
Seems like the bar is now it has to be a mass market product. On another post someone else commented a SaaS doesn't count if it doesn't earn sustainable revenue.
I guess OpenClaw also doesn't count because we don't know how much Peter got from OpenAI.
This is an ideological flame war, not a rational discussion. There's no convincing anyone.
Sounds like the goalposts are moving from "not useless stuff focused on pretending to improve productivity or projects that make it easier to use AI" to "extremely useful stuff".
One issue is that I interpreted the parent as OR, not AND. "useless stuff OR productivity tools OR AI tools".
Moreover though, I'm not even saying you shouldn't do those things. I'm actually playing around with AI quite a bit, and certainly have created my share of useless/productivity tools. But it's not a flex to show off your own Flappy Birds or OpenNanoClaw clone, even if they are written in COBOL or MUMPS.
And they definitely do not have to be "extremely useful". But they should answer the question: what problem does it solve?
Fair. But finally we are seeing what LLM proponents are putting forward.
And it’s exactly what I expected - lines of code. Cute. But… so what? This is not good for the AI hype and nor any continued support for future investment.
On the other hand all this stuff is going to drive continual innovation. The more tokens generated the more model producers invest. And we might eventually get to a place of local models.
I have the opposite experience, the amount of AI boosters deriding the less enthusiastic, gleefully exclaiming how someone will be "left behind" if they don't immediately adopt the latest hype cycle, or sharing AI slop and either embellishing or outright lying about it's capabilities is making me want to log off forever. "Handwritten code? Don't you only care about providing maximum shareholder value?" No.
The user you're responding too lists a "blood test viewer" [0], which looks to be a tool that turns his blood test PDFs into structured and analyzed data. You're saying that unless he continuously revises/upgrades the code, it's still "abandonware" even if it meets his needs for the near future?
Bit rot is real. The dependencies listed here include calling into AI APIs that will stop working with time. So yes if no one keeps this up to date it will rot into useless likely very quickly.
That’s not even mentioning that this tools doesn’t do much beyond wrap a call to Claude. And it’s using Claude to display blood test data to the end user. This is not something I’d trust an LLM to not mess up. You’d really want to double check every single result.
Missing the point. I no longer need to buy or rely on someone else for software I want to use. A lot of things I want to do ARE one offs. I can write software and throw it away when I'm done.
I know this sounds sarcastic but I really mean it: For years everyone has been monastically extolling some variation of "the best code is deleted code". Now, we have a machine that spits out infinite code that we can infinitely delete. It's a blessing that we can have shitty code generated that exposes at light speed how shitty our ideas are and have always been.
Maybe, although it's actually giving me OCD, I think. It's really hard to tune out because of the irregular ticking. I implemented a regular mode to combat this, defeating the purpose somewhat.
Unpredictable things catch our attention - it's the exceptions that are important to survival, and our brains evolved to cope with the stimuli that this experiment messes with.
Something like this would be anxiety inducing for most people, I bet. That'd be an excellent experiment, track heart rate, EEG, and performance on a range of cognitive tasks with 2 minute long breaks between each tasks, one group exposed to the irregular ticking, another exposed to regular ticking, another with silence, and one last one with pleasant white noise.
It sounded fun (and it is)! My favorite mode is one that ticks each second imperceptibly fast, and then stalls for a second in one of the ticks (so that it lasts two).
It's just the right amount of "did that clock just skip a beat? Nah must just be my imagination".
I get the sentiment, but this is natural with a groundbraking new technology. We are still in the process of figuring out how to best apply generative LLM's in a productive way. Lots of people tinker and share their results. Most is surely hype and will get thrown away and forgotten soon, but some is solid. And I am glad for it as I did not take part in that but now enjoy the results as the agents have become really good now.
This is exactly the same reason why the appropriate question to ask about Haskell is "where are the open source projects that are useful for something that is not programming?"
The answer for Haskell after 3 decades is very, very little. Pandoc, Git Annexe, Xmonad. Might be something else since I last did the exercise but for Haskell the answer is not much. Then we examine why the kids (us kids of all ages) can't or don't write Haskell programs.
The answer for LLM coding may be very different. But the question "where is the software that does something that solves a problem outside its own orbit" is crucial. (You have a problem. You want to use foo to solve it, now you have two problems but you can use foo to solve a part of the second one!!)
The price of getting code written just went down. Where are the site/business launches? Apps? New ideas being built? Specifically. With links. Not general, hand-wavy "these are the sorts of things that ..." because even if it's superb analysis, without some data that can be checked it's indistinguishable from hype.
For instance, there is a abandoned open source project, I would have liked to see revived, https://www.wickeditor.com/
(a attempt at recreating flash with web technology). Current official state in the repo: outdated dependencies, build process, etc.
I looked into doing it manually, but gave up. Way too much dirty work and me no energy for that.
Then I discovered that claude CLI got good - and told it to do it (with some handholding).
And it did it. Build process modernized. No more outdated dependencies. Then I added some features I missed in the original wick editor. Again, it did it and it works.
A working editor that was abandoned and missed features - now working again with the missing features. With minimal work done from my side (but I did put in work before to understand the source).
I call this a very useful result. There are lots of abandoned half working projects out there. Lots of value to be recovered. Unlike Haskell, Agents are not just busy with building agents, but real tools.
Currently I have the agents refactor a old codebase of mine. Lot's of tech dept. Lot's of hacks. Bad documentation. There are features I wanted to implement for ages but never did as I did not wanted to touch that ugly code again. But claude did it. It is almost scary of what they are already capable of.
I don't think you should feel like your personal projects need to be vetted by an armchair peanut gallery. It's actually kind of offensive how so many people show up in a thread like this and demand that what sparked joy for you be formally subjected to a gauntlet of moving goalpost validation markers.
Quite simply, I don't think that they are asking or arguing in good faith.
I've actually felt the same way about some (not all) but some "productivity" hacks I've seen people post online with their OpenClaw setups.
I chuckle when I see some of them because you could achieve the same (or often faster) result by jotting a note onto a notecard and sticking it in your pocket.
Most of the other automations running don't really seem to serve any real purpose at all.
I mean I’m using it to deconstruct and reinvent my development process from the ground up, but it’s so easy to do this now and so customized for my specific needs that the idea of posting about it never crossed my mind.
If you are a parent, you know that feeling when your child is struggling with something and gets frustrated, but you keep silent and don't help because you know that the child has to figure this out by themselves. That's the same feeling I get when I hear all those doom and gloom perspectives on how AI is ruining coding :D
From where I stand this thing is going to provide great leverage to those who don’t simply just write code. I personally doubt the thing will ever get to a place where it can be trusted to operate alone - it needs a team of people and to go super fast you need more people.
Moreover, the price won’t be high due to competition.
I’ve changed my view on LLMs as being good, as long as competition is fierce.
I completely agree. Most programmers work on rather boring and not particularly novel things. If they don't adapt, then they'll be replaced.
I do think it'll be a while before LLMs make significant contributions to complex projects, though. For example I can't imagine many maintainers of the Linux kernel use LLMs much.
I believe your skills are atrophying when you use these things no matter how trivial the case. That compounds with their bias towards solving problems by producing more code to further reduce your productivity without them.
And if we do adapt we might still get replaced because less of us will be able to do more. Or we wont because of Jevons Paradox. Linux maintainers on the other hand can code (with and without AI) what I could not (with or without AI). So in a way becoming a more knowledgeable, more skilled programmer is the way? In any case, too much speculation about the future.
I've written an Obsidian clone for myself, which has proper Emacs keybindings. Took me a few hours too many to get in all the features that I need.
What I find interesting is that I have little motivation to open source it. Making it usable for others requires a substantial amount of time, which would otherwise be just a fraction of the development time.
I have a theory (and I'm sure I am far from the first one to voice it) that the number of useful open source projects released to the public will be on the decline now that anyone scratch their own itch with a few hours of vibe coding. Why would I spend hours evaluating a dozen different note-taking applications and _maybe_ find one that is _kinda close_ to what I want, if I can instead have Claude vibe me one up _exactly_ the way I want it?
(I actually did write my own note-taking application, but that was before LLMs were any good at writing code.)
I was thinking about doing the same. Build a clone with AI custom tailored for my own quirks. And not bothering to open source it because it's too bespoke for anyone else. How hard was this? Can you share any advice?
I've heard a few people say I haven't written a single line of code since ...
What do people think of it?
I personal don't think that's a badge of honor. Aside from losing your coding skills you miss oppurtunities to generate AI pieces and connect them to existing systems that can't be feed into the AI. Plus making small changes is easier than having the AI make them without messing something else up.
I wouldn't say strictly speaking that I've written no code, but the amount of code I've written since "committing" to using Claude Code since February is absolutely miniscule.
I prefer having Claude make even small changes at this point since every change it makes ends up tweaking it to better understand something about my coding convention, standard, interpretation etc... It does pick up on these little changes and commits them to memory so that in the long run you end up not having to make any little changes whatsoever.
And to drive this point further, even prior to using LLMs, if I review someone's work and see even a single typo or something minor that I could probably just fix in a second, I still insist that the author is the one to fix it. It's something my mentor at Google did with me which at the time I kind of felt was a bit annoying, but I've come to understand their reason for it and appreciate it.
Sort of... Claude Code writes to a memory.md file that it uses to store important information across conversations. If I review mine it has plenty of details about things like coding convention, structure, and overall architecture of the application it's working on.
The second thing Claude Code does is when it reaches the end of its context window it /compact the session, which takes a summary of the current session, dumps it into a file, and then starts a new session with that summary. But it also retains logs of all the previous sessions that it can use and search through.
Looking over my session of Claude Code, out of the 256k tokens available, about 50k of these tokens are used among "memory" and session summaries, and 200k tokens are available to work with. The reality is that the vast majority of tokens Claude Code uses is for its own internal reasoning as opposed to being "front-end" facing so to speak.
Additionally given that ChatGPT Codex just increased its context length from 256k to 1 million tokens, I expect Anthropic will release an update within a month or so to catch up with their own 1 million token model.
1. The closer the context gets to full the worse it performs.
2. The more context it has the less it weights individual items.
That is Claude might learn you hate long functions and add a line about short functions. When that is the only thing in the function it is likely to follow other very closely. But when it’s 1 piece of such longer context, it is much more likely to ignore it.
3. Tokens cost money even you are currently being subsidized.
4. You have no idea how new models and new system prompt will perform with your current memory.md file.
5. Unlike learning something yourself, anything you teach Claude is likely to start being controlled by your employer. They might not let you take it with you when you go.
I haven’t typed a line of code in like six months but I still review all production code and stay very connected to the codebase. I don’t feel my skills have withered at all.
What did you think of Dagger? I used Earthly a while ago but the one thing I didn't like was that it couldn't parallelize runs, since it only ran on one CI instance. Other than that, I liked that I could run my entire CI pipeline locally, but didn't like it so much that I ended up using it for much else.
I really like Dagger. I had a _lot_ of weird issues with Earthly, like edge cases. Dagger has been mostly solid.
It still has gaps. I don't think they've landed on the right model for CI. Like Earthly, their model is a CI runner + local cache. I believe a distributed cache (like Bazel) makes more sense.
If I were choosing between the two I'd personally always pick Dagger, but I think there is a strong argument for Earthly for simpler projects. If you're using multiple Earthfiles or a few hundred lines of Earthly, I think you've outgrown it.
I tend to only use LLMs to complete projects that are relatively unique and that haven't been done before. Because if I'm not going to get anything out of the journey, I might as well get something out of the destination.
*Piece Together*
An animated puzzle game that I built with a fairly heavy reliance on agentic coding, especially for scaffolding. I did have to jump in and tweak some things manually (the piece-matching algorithm, responsive design, etc.), but overall I’d estimate that LLMs handled about 80% of the work. It's heavily based on the concept of animated puzzles in the early edutainment game The Island of Dr. Brain.
Lend Me Your Ears is an interactive web-based game inspired by the classic Simon toy (originally by Milton Bradley). It presents players with a sequence of musical notes and challenges them to reproduce the sequence using either an on-screen piano, MIDI keyboard, or an acoustic instrument such as a guitar.
A voice controlled blindfold chess game that uses novel types of approaches (last N pieces moved hidden, fade over time, etc). Already been already playing it daily on my walks.
It's based off an old word game where one person tries to come up with three words: sign, watch, bus. The other person has to think of a common word that forms compound-style words with each of them: stop.
I was quite surprised to see that this didn't exist online already.
Slide puzzles for qualified MENSA members. I built it for a friend who's basically a real-life equivalent of Dustin Hoffman's character from Rain Man. So you might have to rearrange a slide puzzle from the periodic table of elements, or the U.S. presidents by portrait, etc.
Transforms random words on web pages into different writing systems like Hiragana, Braille, and Morse Code to help you learn and practice reading these alphabets so you can practice the most functionally pointless task, like being able to read braille visually.
All of these were built with varying levels of assistance from agentic coding. None of them were purely vibe-coded and there was a great deal of manual and unit testing to verify functionality as it was built.
> All of these were built with varying levels of assistance from agentic coding. None of them were purely vibe-coded and there was a great deal of manual and unit testing to verify functionality as it was built.
It also seems like none of them are relatively unique and all of them have been done before.
Simon toy that's integrated into an ear training tool?
Blindfold chess with Last N moves hidden?
Mensa-style slide puzzles?
An extension that converts random words into phonetic equivalents like morse, braille, and vorticon?
I've also made some way less useful stuff like a win32 app that lets you physically grab a window and hurl it which invokes an WM_DESTROY when it completely is off the screen.
And an app that measures low frequencies to tell if you are blowing into the mic and then increases the speed of the CPU fan to cool it down.
> At work, all that matters is that value is delivered to the business. Code needs to be maintainable so that new requirements can be met. Code follows design patterns, when appropriate, because they are known solutions to common problems, and thus are easy to talk about with others. Code has type systems and static analysis so that programmers make fewer mistakes.
This is a narrow view of software engineering. Thinking that your role is "code that works" is hardly better than thinking you're a "(human) resource that produces code". Your job is to provide value. You do that by building knowledge, not only of the system you're developing but of the problem space you're exploring, the customers you're serving, the innovations you can do that your competitors can't.
It's like saying that a soccer player's purpose is "to kick a ball" and therefore a machine that launches balls faster and further than any human will replace all soccer players, and soon all professional teams will be made up of robots.
I think your view is sentimental. For businesses the code usually IS the value, and devs ARE human resources that produce code. It sounds cynical, but it’s basically how most orgs operate. From the company’s POV employees function as cogs in a larger system whose purpose is to generate value considering that businesses are structured to optimize outcomes i.e. Profit. If tech appears that can produce the same output more cheaply or efficiently, companies will most definitely as we've seen so far explore replacing people with it. I mean take a look at corporate posture around LLMs. But do I get the point you’re making about knowledge, domain understanding, and solving real problems because those things clearly matter in practice but from the company’s pov, they matter only because they help produce better code/systems which are still the concrete artifact that embodies the business logic and operations. A symbolic model of the business itself encoded in software. So the framing of devs as human resources that produce code and code as the primary value correctly describes how many businesses see the relationship. And I don't really see the equivalence between SWE-ing in a business context and sports
> From the company’s POV employees function as cogs in a larger system whose purpose is to generate value considering that businesses are structured to optimize outcomes i.e. Profit. If tech appears that can produce the same output more cheaply or efficiently, companies will most definitely as we've seen so far explore replacing people with it.
Businesses wish this were the case, and many will even say it or start to believe it. But it doesn't bare out to be true in practice.
Think about it this way, engineers are expensive so a company is going to want to have as few of them as possible to do as much work as possible. Long before LLMs came along there have been many rounds of "replace expensive engineers" fads.
Visual programming was going to destroy the industry, where any idiot could drag and drop a few boxes and put together software. Turns out that didn't work out and now visual programming is all but dead. Then we had consultants and software consultancies. Why keep engineers on staff and have to deal with benefits and HR functions when you can hire consultants for just long enough to get the job done and end their contracts. Then we had offshoring. Why hire expensive developers in markets like California when you can hire far cheaper engineers abroad in a country with lower wages and laxer employment law. (It's not a quality thing either, many of these engineers are unquestionably excellent.)
Or, think about what happens when software companies get acquired. It's almost unheard of for the acquiring company to layoff all of the engineering staff from the acquired company right away, if anything it's the opposite with vesting incentives to convince engineers to stay.
If all that mattered was the code and the systems, and people were cogs that produced code that businesses wanted to optimise, then none of these actions make sense. You'd see companies offshore and use consultants with the company that does "good enough" as cheaply as possible. You'd see engineers from acquisitions be laid off immediately, replaced with cheaper staff as fast as possible.
There are businesses like that operate like this, it happens all the time. But, all of the most successful and profitable tech companies in the world don't do this. Why?
> Speaking in the context of solving a problem: does AI need to write beautiful code? No. It needs to write code that works. The code doesn’t need to be maintainable in the traditional sense. If you have sufficient tests, you can throw some LLMs at a pile of “bad” code and have them figure it out.
Code doesn't need to be "beautiful", but the beauty of code has nothing to do with maintainability. Linus once said "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." The actual hard part of software is not the code, it's what isn't in the code - the assumptions, relationships, feedback loops, emergent behaviours, etc. Maintainability in that regard is about system design. Imagine software as a graph, the nodes being pieces of code and the edges being those implicit relationships. LLM's are good at generating the nodes but useless at the edges.
The only thing that seems to work is to have a validation criteria (eg. a test suite) that the LLM can use to do a guided random walk towards a solution where the edges and nodes align to satisfy the criteria. This can be useful if what you are doing doesn't really matter, like in the case of all the pet projects and tools people share. But it does matter if your program assumes responsibility somewhere, like if you're handling user data. This idea of guardrail-style programming has been around for a while, but nobody drives by bouncing off the guardrails to get to their destination, because it's much more efficient to encode what a program should do instead of what it shouldn't, which is the case with this type of mega-test-driven-development. Is it more efficient to tell someone where not to go when giving directions as opposed to telling them how to get there?
Take the Cloudflare Next.js experiment for example - their version passed all the Next.js tests but still had issues because the test suite didn't even come close to encoding how the system works.
So no, you still need to care about maintainability. You don't need to obsess over code aesthetics or design patterns or whatever, but you never needed to do that. In fact, more than ever programmers need to be concerned with the edges of their software and how they can guide the LLM's to generate the nodes (code) while maintaining the invariants of the edges.
They whole “you can just throw LLMs at the test suite and regenerate the code” thing needs to die excuse it can’t work for any software that has users. A test suite cannot feasibly cover every observable behavior. Every time you regenerate the code this way you’ll change a huge chunk of the thousands of little observable behaviors that aren’t fixed in place by the test suite. You can’t do this if you have users.
Similarly to your directions analogy, I’ve been using the the analogy id trying to ensure that a 1000 restaurant franchise produces the exact same peanut butter sandwich for ever customer.
It’s much easier to figure out the primitives that your employees understand and then use those primitives to describe exactly how to build a sandwich than it is to write a massive specification that describes what they should produce and just let them figure it out.
2025 Taxes
Dumped all pdfs of all my tax forms into a single folder, asked Claude the rename them nicely. Ask it to use Gemini 2.5 Flash to extract out all tax-relevant details from all statements / tax forms. Had it put together a webui showing all income, deductions, etc, for the year. Had it estimate my 2025 tax refund / underpay.
Result was amazing. I now actually fully understand the tax position. It broke down all the progressive tax brackets, added notes for all the extra federal and state taxes (i.e. Medicare, CA Mental Health tax, etc).
Finally had Claude prepare all of my docs for upload to my accountant: FinCEN reporting, summary of all docs, etc.
Desk Fabrication
Planning on having a furniture maker fabricate a custom walnut solid desk for a custom office standing desk. Want to create a STEP of the exact cuts / bevels / countersinks / etc to help with fabrication.
Worked with Codex to plan out and then build an interactive in-browser 3D CAD experience. I can ask Codex to add some component (i.e. a grommet) and it will generate a parameterized B-rep geometry for that feature and then allow me to control the parameters live in the web UI.
Codex found Open CASCADE Technology (OCCT) B-rep modeling library, which has a web assembly compiled version, and integrated it.
Now have a WebGL view of the desk, can add various components, change their parameters, and see the impact live in 3D.
What scares me though is how I've (still) seen ChatGPT make up numbers in some specific scenarios.
I have a ChatGPT project with all of my bloodwork and a bunch of medical info from the past 10 years uploaded. I think it's more context than ChatGPT can handle at once. When I ask it basic things like "Compare how my lipids have trended over the past 2 years" it will sometimes make up numbers for tests, or it will mix up the dates on a certain data points.
It's usually very small errors that I don't notice until I really study what it's telling me.
And also the opposite problem: A couple days ago I thought I saw an error (when really ChatGPT was right). So I said "No, that number is wrong, find the error" and instead of pushing back and telling me the number was right, it admitted to the error (there was no error) and made up a reason why it was wrong.
Hallucinations have gotten way better compared to a couple years ago, but at least ChatGPT seems to still break down especially when it's overloaded with a ton of context, in my experience.
1. I keep all my accounts in accounting software (originally Wave, then beancount)
2. Because the machinery is all in programmatically queriable means, the data is not in token-space, only the schema and logic
I then use tax software to prep my professional and personal returns. The LLM acts as a validator, and ensures I've done my accounts right. I have `jmap` pull my mail via IMAP, my Mercury account via a read-only transactions-only token and then I let it compare against my beancount records to make sure I've accounted for things correctly.
For the most part, you want it to be handling very little arithmetic in token-space though the SOTA models can do it pretty flawlessly. I did notice that they would occasionally make arithmetic errors in numerical comparison, but when using them as an assistant you're not using them directly but as a hypothesis generator and a checker tool and if you ask it to write out the reasoning it's pretty damned good.
For me Opus 4.6 in Claude Code was remarkable for this use-case. These days, I just run `,cc accounts` and then look at the newly added accounts in fava and compare with Mercury. This is one of those tedious-to-enter trivial-to-verify use-cases that they excel at.
To be honest, I was fine using Wave, but without machine-access it's software that's dead to me.
To then "aggregate" all of the json outputs, I had Claude look at the json outputs, and then iterate on a Python tool to programmatically do it. I saw it iterating a few times on this: write the most naive Python tool, run it, throws exception, rinse and repeat, until it was able to parse all the json files sensibly.
Which should pair well with the “write a script” tactic.
And it usually takes just as long.
You couldn’t do that with TurboTax or block’s tax file? You don’t have to submit or pay.
Hope you dont get audited
I imagine your accountant had the same reaction I do when an amateur shows me their vibe codebase.
* https://www.stavros.io/posts/i-made-a-voice-note-taker/ - A voice note recorder.
* https://github.com/skorokithakis/stavrobot - My secure AI personal assistant that's made my life admin massively easier.
* https://github.com/skorokithakis/macropad - A macropad.
* https://github.com/skorokithakis/sleight-of-hand - A clock that ticks seconds irregularly but is accurate for minutes.
* https://pine.town - A whimsical little massively multiplayer drawing town.
* https://encyclopedai.stavros.io - A fictional encyclopedia.
* https://justone.stavros.io - A web implementation of the board game Just One.
* https://www.themakery.cc - The website and newsletter for my maker community.
* https://theboard.stavros.io - A feature board that implements itself.
* https://github.com/skorokithakis/dracula - A blood test viewer.
* https://github.com/skorokithakis/support-email-bot - An email bot to answer common support queries for my users.
Maybe some of these will beat the rap.
And even for the ones that might "beat the rap", I don't understand from your descriptions why they are interesting or unique. A voice note recorder? Cool. There are already hundreds if not thousands of those, why did you need to make your own in the first place? I'm not saying that yours isn't special, I'm just saying that it doesn't help to post the blandest description possible if you're trying to impress people with the utility of your utility.
Seems like the bar is now it has to be a mass market product. On another post someone else commented a SaaS doesn't count if it doesn't earn sustainable revenue.
I guess OpenClaw also doesn't count because we don't know how much Peter got from OpenAI.
This is an ideological flame war, not a rational discussion. There's no convincing anyone.
Moreover though, I'm not even saying you shouldn't do those things. I'm actually playing around with AI quite a bit, and certainly have created my share of useless/productivity tools. But it's not a flex to show off your own Flappy Birds or OpenNanoClaw clone, even if they are written in COBOL or MUMPS.
And they definitely do not have to be "extremely useful". But they should answer the question: what problem does it solve?
And it’s exactly what I expected - lines of code. Cute. But… so what? This is not good for the AI hype and nor any continued support for future investment.
On the other hand all this stuff is going to drive continual innovation. The more tokens generated the more model producers invest. And we might eventually get to a place of local models.
And with AI the result of 99.9% is abandonware. Just piles of code no one will ever touch again.
Which proves the point of no productivity gains. Its just cheap dopamine hits.
[0] https://github.com/skorokithakis/dracula
That’s not even mentioning that this tools doesn’t do much beyond wrap a call to Claude. And it’s using Claude to display blood test data to the end user. This is not something I’d trust an LLM to not mess up. You’d really want to double check every single result.
Constant enshittification and UI redesigns are driven by the provider to justify monthly extortion.
Sounds like something that could be tried as a fix for a kind of OCD (obsessive seconds counting).
Something like this would be anxiety inducing for most people, I bet. That'd be an excellent experiment, track heart rate, EEG, and performance on a range of cognitive tasks with 2 minute long breaks between each tasks, one group exposed to the irregular ticking, another exposed to regular ticking, another with silence, and one last one with pleasant white noise.
It's just the right amount of "did that clock just skip a beat? Nah must just be my imagination".
I get the sentiment, but this is natural with a groundbraking new technology. We are still in the process of figuring out how to best apply generative LLM's in a productive way. Lots of people tinker and share their results. Most is surely hype and will get thrown away and forgotten soon, but some is solid. And I am glad for it as I did not take part in that but now enjoy the results as the agents have become really good now.
This is exactly the same reason why the appropriate question to ask about Haskell is "where are the open source projects that are useful for something that is not programming?"
The answer for Haskell after 3 decades is very, very little. Pandoc, Git Annexe, Xmonad. Might be something else since I last did the exercise but for Haskell the answer is not much. Then we examine why the kids (us kids of all ages) can't or don't write Haskell programs.
The answer for LLM coding may be very different. But the question "where is the software that does something that solves a problem outside its own orbit" is crucial. (You have a problem. You want to use foo to solve it, now you have two problems but you can use foo to solve a part of the second one!!)
The price of getting code written just went down. Where are the site/business launches? Apps? New ideas being built? Specifically. With links. Not general, hand-wavy "these are the sorts of things that ..." because even if it's superb analysis, without some data that can be checked it's indistinguishable from hype.
Whatever data we get will be very informative.
I looked into doing it manually, but gave up. Way too much dirty work and me no energy for that.
Then I discovered that claude CLI got good - and told it to do it (with some handholding).
And it did it. Build process modernized. No more outdated dependencies. Then I added some features I missed in the original wick editor. Again, it did it and it works.
A working editor that was abandoned and missed features - now working again with the missing features. With minimal work done from my side (but I did put in work before to understand the source).
I call this a very useful result. There are lots of abandoned half working projects out there. Lots of value to be recovered. Unlike Haskell, Agents are not just busy with building agents, but real tools. Currently I have the agents refactor a old codebase of mine. Lot's of tech dept. Lot's of hacks. Bad documentation. There are features I wanted to implement for ages but never did as I did not wanted to touch that ugly code again. But claude did it. It is almost scary of what they are already capable of.
At work, I would say I've done plenty of "useful" things with AI, but that's hard to show off given that I work on an internal application.
Quite simply, I don't think that they are asking or arguing in good faith.
I chuckle when I see some of them because you could achieve the same (or often faster) result by jotting a note onto a notecard and sticking it in your pocket.
Most of the other automations running don't really seem to serve any real purpose at all.
But hey, if it's fun, have at it.
I'm starting to believe using them is more likely to make you obsolete than not.
From where I stand this thing is going to provide great leverage to those who don’t simply just write code. I personally doubt the thing will ever get to a place where it can be trusted to operate alone - it needs a team of people and to go super fast you need more people.
Moreover, the price won’t be high due to competition.
I’ve changed my view on LLMs as being good, as long as competition is fierce.
I do think it'll be a while before LLMs make significant contributions to complex projects, though. For example I can't imagine many maintainers of the Linux kernel use LLMs much.
I believe your skills are atrophying when you use these things no matter how trivial the case. That compounds with their bias towards solving problems by producing more code to further reduce your productivity without them.
I do agree with you to some extent. I think anyone who uses LLMs will need to set aside some time writing code by hand to keep their skills sharp.
What I find interesting is that I have little motivation to open source it. Making it usable for others requires a substantial amount of time, which would otherwise be just a fraction of the development time.
(I actually did write my own note-taking application, but that was before LLMs were any good at writing code.)
What do people think of it?
I personal don't think that's a badge of honor. Aside from losing your coding skills you miss oppurtunities to generate AI pieces and connect them to existing systems that can't be feed into the AI. Plus making small changes is easier than having the AI make them without messing something else up.
I prefer having Claude make even small changes at this point since every change it makes ends up tweaking it to better understand something about my coding convention, standard, interpretation etc... It does pick up on these little changes and commits them to memory so that in the long run you end up not having to make any little changes whatsoever.
And to drive this point further, even prior to using LLMs, if I review someone's work and see even a single typo or something minor that I could probably just fix in a second, I still insist that the author is the one to fix it. It's something my mentor at Google did with me which at the time I kind of felt was a bit annoying, but I've come to understand their reason for it and appreciate it.
The second thing Claude Code does is when it reaches the end of its context window it /compact the session, which takes a summary of the current session, dumps it into a file, and then starts a new session with that summary. But it also retains logs of all the previous sessions that it can use and search through.
Looking over my session of Claude Code, out of the 256k tokens available, about 50k of these tokens are used among "memory" and session summaries, and 200k tokens are available to work with. The reality is that the vast majority of tokens Claude Code uses is for its own internal reasoning as opposed to being "front-end" facing so to speak.
Additionally given that ChatGPT Codex just increased its context length from 256k to 1 million tokens, I expect Anthropic will release an update within a month or so to catch up with their own 1 million token model.
1. The closer the context gets to full the worse it performs.
2. The more context it has the less it weights individual items.
That is Claude might learn you hate long functions and add a line about short functions. When that is the only thing in the function it is likely to follow other very closely. But when it’s 1 piece of such longer context, it is much more likely to ignore it.
3. Tokens cost money even you are currently being subsidized.
4. You have no idea how new models and new system prompt will perform with your current memory.md file.
5. Unlike learning something yourself, anything you teach Claude is likely to start being controlled by your employer. They might not let you take it with you when you go.
keep in mind that those 50k memory tokens would likely be cached after the first run and thus significantly cheaper
It still has gaps. I don't think they've landed on the right model for CI. Like Earthly, their model is a CI runner + local cache. I believe a distributed cache (like Bazel) makes more sense.
If I were choosing between the two I'd personally always pick Dagger, but I think there is a strong argument for Earthly for simpler projects. If you're using multiple Earthfiles or a few hundred lines of Earthly, I think you've outgrown it.
*Piece Together*
An animated puzzle game that I built with a fairly heavy reliance on agentic coding, especially for scaffolding. I did have to jump in and tweak some things manually (the piece-matching algorithm, responsive design, etc.), but overall I’d estimate that LLMs handled about 80% of the work. It's heavily based on the concept of animated puzzles in the early edutainment game The Island of Dr. Brain.
https://animated-puzzles.specr.net
*Lend Me Your Ears*
Lend Me Your Ears is an interactive web-based game inspired by the classic Simon toy (originally by Milton Bradley). It presents players with a sequence of musical notes and challenges them to reproduce the sequence using either an on-screen piano, MIDI keyboard, or an acoustic instrument such as a guitar.
https://lend-me-your-ears.specr.net
*Shâh Kur - Invisible Chess*
A voice controlled blindfold chess game that uses novel types of approaches (last N pieces moved hidden, fade over time, etc). Already been already playing it daily on my walks.
https://shahkur.specr.net
*Word game to find the common word*
It's based off an old word game where one person tries to come up with three words: sign, watch, bus. The other person has to think of a common word that forms compound-style words with each of them: stop.
I was quite surprised to see that this didn't exist online already.
https://common-thread.specr.net
*A Slide Puzzle*
Slide puzzles for qualified MENSA members. I built it for a friend who's basically a real-life equivalent of Dustin Hoffman's character from Rain Man. So you might have to rearrange a slide puzzle from the periodic table of elements, or the U.S. presidents by portrait, etc.
https://slide-puzzles.specr.net
*Glyphshift*
Transforms random words on web pages into different writing systems like Hiragana, Braille, and Morse Code to help you learn and practice reading these alphabets so you can practice the most functionally pointless task, like being able to read braille visually.
https://github.com/scpedicini/glyph-shift
All of these were built with varying levels of assistance from agentic coding. None of them were purely vibe-coded and there was a great deal of manual and unit testing to verify functionality as it was built.
It also seems like none of them are relatively unique and all of them have been done before.
Simon toy that's integrated into an ear training tool?
Blindfold chess with Last N moves hidden?
Mensa-style slide puzzles?
An extension that converts random words into phonetic equivalents like morse, braille, and vorticon?
I've also made some way less useful stuff like a win32 app that lets you physically grab a window and hurl it which invokes an WM_DESTROY when it completely is off the screen.
And an app that measures low frequencies to tell if you are blowing into the mic and then increases the speed of the CPU fan to cool it down.
This is a narrow view of software engineering. Thinking that your role is "code that works" is hardly better than thinking you're a "(human) resource that produces code". Your job is to provide value. You do that by building knowledge, not only of the system you're developing but of the problem space you're exploring, the customers you're serving, the innovations you can do that your competitors can't.
It's like saying that a soccer player's purpose is "to kick a ball" and therefore a machine that launches balls faster and further than any human will replace all soccer players, and soon all professional teams will be made up of robots.
Businesses wish this were the case, and many will even say it or start to believe it. But it doesn't bare out to be true in practice.
Think about it this way, engineers are expensive so a company is going to want to have as few of them as possible to do as much work as possible. Long before LLMs came along there have been many rounds of "replace expensive engineers" fads.
Visual programming was going to destroy the industry, where any idiot could drag and drop a few boxes and put together software. Turns out that didn't work out and now visual programming is all but dead. Then we had consultants and software consultancies. Why keep engineers on staff and have to deal with benefits and HR functions when you can hire consultants for just long enough to get the job done and end their contracts. Then we had offshoring. Why hire expensive developers in markets like California when you can hire far cheaper engineers abroad in a country with lower wages and laxer employment law. (It's not a quality thing either, many of these engineers are unquestionably excellent.)
Or, think about what happens when software companies get acquired. It's almost unheard of for the acquiring company to layoff all of the engineering staff from the acquired company right away, if anything it's the opposite with vesting incentives to convince engineers to stay.
If all that mattered was the code and the systems, and people were cogs that produced code that businesses wanted to optimise, then none of these actions make sense. You'd see companies offshore and use consultants with the company that does "good enough" as cheaply as possible. You'd see engineers from acquisitions be laid off immediately, replaced with cheaper staff as fast as possible.
There are businesses like that operate like this, it happens all the time. But, all of the most successful and profitable tech companies in the world don't do this. Why?
Code doesn't need to be "beautiful", but the beauty of code has nothing to do with maintainability. Linus once said "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." The actual hard part of software is not the code, it's what isn't in the code - the assumptions, relationships, feedback loops, emergent behaviours, etc. Maintainability in that regard is about system design. Imagine software as a graph, the nodes being pieces of code and the edges being those implicit relationships. LLM's are good at generating the nodes but useless at the edges.
The only thing that seems to work is to have a validation criteria (eg. a test suite) that the LLM can use to do a guided random walk towards a solution where the edges and nodes align to satisfy the criteria. This can be useful if what you are doing doesn't really matter, like in the case of all the pet projects and tools people share. But it does matter if your program assumes responsibility somewhere, like if you're handling user data. This idea of guardrail-style programming has been around for a while, but nobody drives by bouncing off the guardrails to get to their destination, because it's much more efficient to encode what a program should do instead of what it shouldn't, which is the case with this type of mega-test-driven-development. Is it more efficient to tell someone where not to go when giving directions as opposed to telling them how to get there?
Take the Cloudflare Next.js experiment for example - their version passed all the Next.js tests but still had issues because the test suite didn't even come close to encoding how the system works.
So no, you still need to care about maintainability. You don't need to obsess over code aesthetics or design patterns or whatever, but you never needed to do that. In fact, more than ever programmers need to be concerned with the edges of their software and how they can guide the LLM's to generate the nodes (code) while maintaining the invariants of the edges.
Similarly to your directions analogy, I’ve been using the the analogy id trying to ensure that a 1000 restaurant franchise produces the exact same peanut butter sandwich for ever customer.
It’s much easier to figure out the primitives that your employees understand and then use those primitives to describe exactly how to build a sandwich than it is to write a massive specification that describes what they should produce and just let them figure it out.