Hacker News

If AI writes code, should the session be part of the commit?

494 points by mandel_x 3 days ago | 388 comments

jedberg 3 days ago

The way I write code with AI is that I start with a project.md file, where I describe what I want done. I then ask it to make a plan.md file from that project.md to describe the changes it will make (or what it will create if Greenfield).

I then iterate on that plan.md with the AI until it's what I want. I then ask it to make a detailed todo list from the plan.md and attach it to the end of plan.md.

Once I'm fully satisfied, I tell it to execute the todo list at the end of the plan.md, and don't do anything else, don't ask me any questions, and work until it's complete.

I then commit the project.md and plan.md along with the code.

So my back and forth on getting the plan.md correct isn't in the logs, but that is much like intermediate commits before a merge/squash. The plan.md is basically the artifact an AI or another engineer can use to figure out what happened and repeat the process.

The main reason I do this is so that when the models get a lot better in a year, I can go back and ask them to modify plan.md based on project.md and the existing code, on the assumption it might find it's own mistakes.

jumploops 2 days ago

I do something similar, but across three doc types: design, plan, and debug

Design works similar to your project.md file, but on a per feature request. I also explicitly ask it to outline open questions/unknowns.

Once the design doc (i.e. design/[feature].md) has been sufficiently iterated on, we move to the plan doc(s).

The plan docs are structured like `plan/[feature]/phase-N-[description].md`

From here, the agent iterates until the plan is "done" only stopping if it encounters some build/install/run limitation.

At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.

We review these hypotheses, sometimes iterate, and then tackle them one by one.

An important note for debug flows, similar to manual debugging, it's often better to have the agent instrument logging/traces/etc. to confirm a hypothesis, before moving directly to a fix.

Using this method has led to a 100% vibe-coded success rate both on greenfield and legacy projects.

Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.

miki123211 2 days ago

My "heavy" workflow for large changes is basically as follows:

0. create a .gitignored directory where agents can keep docs. Every project deserves one of these, not just for LLMs, but also for logs, random JSON responses you captured to a file etc.

1. Ask the agent to create a file for the change, rephrase the prompt in its own words. My prompts are super sloppy, full of typos, with 0 emphasis put on good grammar, so it's a good first step to make sure the agent understands what I want it to do. It also helps preserve the prompt across sessions.

2. Ask the agent to do research on the relevant subsystems and dump it to the change doc. This is to confirm that the agent correctly understands what the code is doing and isn't missing any assumptions. If something goes wrong here, it's a good opportunity to refactor or add comments to make future mistakes less likely.

3. Spec out behavior (UI, CLI etc). The agent is allowed to ask for decisions here.

4. Given the functional spec, figure out the technical architecture, same workflow as above.

5. High-level plan.

6. Detailed plan for the first incomplete high-level step.

7. Implement, manually review code until satisfied.

8. Go to 6.

jedberg 2 days ago

> At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.

I'm biased because my company makes a durable execution library, but I'm super excited about the debug workflow we recently enabled when we launched both a skill and MCP server.

You can use the skill to tell your agent to build with durable execution (and it does a pretty great job the first time in most cases) and then you can use the MCP server to say things like "look at the failed workflows and find the bug". And since it has actual checkpoints from production runs, it can zero in on the bug a lot quicker.

We just dropped a blog post about it: https://www.dbos.dev/blog/mcp-agent-for-durable-workflows

zknill 2 days ago

Why an MCP? dbos already ships a cli that appears to have the same features. Why an MCP over a skill that gives context on using the cli?

https://docs.dbos.dev/python/reference/cli

jumploops 2 days ago

> we launched both a skill and MCP server.

My guess is that the MCP was easy enough to add, and some tools only support MCP.

Personal opinion: MCP is just codified context pollution.

jumploops 2 days ago

This is great, giving agents access to logs (dev or prod) tightens the debug flow substantially.

With that said, I often find myself leaning on the debug flow for non-errors e.g. UI/UX regressions that the models are still bad at visualizing.

As an example, I added a "SlopGoo" component to a side project, which uses an animated SVG to produce a "goo" like effect. Ended up going through 8 debug docs[0] until I was satisified.

[0]https://github.com/jumploops/slop.haus/tree/main/debug

nubinetwork 2 days ago

> giving agents access to logs (dev or prod) tightens the debug flow substantially.

Unless the agent doesn't know what it's doing... I've caught Gemini stuck in an edit-debug loop making the same 3-4 mistakes over and over again for like an hour, only to take the code over to Claude and get the correct result in 2-3 cycles (like 5-10 minutes)... I can't really blame Gemini for that too much though, what I have it working on isn't documented very well, which is why I wanted the help in the first place...

frumiousirc 2 days ago

> Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.

FWIW, what you describe maps well to Beads. Your directory structure becomes dependencies between issues, and/or parent/children issue relationship and/or labels ("epic", "feature", "bug", etc). Your markdown moves from files to issue entries hidden away in a JSONL file with local DB as cache.

Your current file-system "UI" vs Beads command line UI is obviously a big difference.

Beads provides a kind of conceptual bottleneck which I think helps when using with LLMs. Beads more self-documenting while a file-system can be "anything".

wek 2 days ago

Similar, but we have the agent write the test cases after writing the plan and then iterate until it passes the test cases.

danenania 2 days ago

I have a similar process and have thought about committing all the planning files, but I've found that they tend to end up in an outdated state by the time the implementation is done.

Better imo is to produce a README or dev-facing doc at the end that distills all the planning and implementation into a final authoritative overview. This is easier for both humans and agents to digest than bunch of meandering planning files.

frank00001 3 days ago

Sounds like the spec driven approach. You should take a look at this https://github.com/github/spec-kit

kriro 2 days ago

I basically use a spec driven approach except I only let Github Spec Kit create the initial md file templates and then fill them myself instead of letting the agent do it. Saves a ton of tokens and is reasonably quick and I actually know I wrote the specs myself and it contains what I want. After I'm happy with the md file "harness" I let the agents loose.

The most frustrating issues that pop up are usually library/API conflicts. I work with Gymnasium or PettingZoo and Rlib or stablebaselines3. The APIs are constantly out of sync so it helps to have a working environment were libraries and APIs are in sync beforehand.

jedberg 2 days ago

Sort of, depending on if your spec includes technology specifics.

For example it might generate a plan that says "I will use library xyz", and I'll add a comment like "use library abc instead" and then tell it to update the plan, which now includes specific technology choices.

It's more like a plan I'd review with a junior engineer.

I'll check out that repo, it might at least give me some good ideas on some other default files I should be generating.

WXLCKNO 2 days ago

or OpenSpec https://github.com/Fission-AI/OpenSpec/

I think it's much better

shinycode 2 days ago

Thanks for the link ! I’m very curious about their choices and methods, I’ll try it

wolletd 3 days ago

> 110 releases in 6 months

sethammons 2 days ago

Almost a release per work day, esp. if you count standard holidays.

malloryerik 2 days ago

Have you tried this? Review?

dmd 2 days ago

https://github.com/obra/superpowers "brainstorming" is pretty much exactly this workflow, and it's great.

nesarkvechnep 2 days ago

By that time you would’ve written the code yourself, only better.

cortesoft 2 days ago

I am sure this is partly tongue in cheek, but no, you can’t have written the code yourself in that amount of time. Would the code be better if you wrote it? Probably, depending on your coding skills.

But it would not be faster.

OP is talking about creating an entire project, from scratch, and having it feature complete at the end.

smohare 2 days ago

[dead]

shinycode 2 days ago

I also do that and it works quite well to iterate on spec md files first. When every step is detailed and clear and all md files linked to a master plan that Claude code reads and updates at every step it helps a lot to keep it on guard rails. Claude code only works well on small increments because context switching makes it mix and invent stuff. So working by increments makes it really easy to commit a clean session and I ask it to give me the next prompt from the specs before I clear context. It always go sideways at some point but having a nice structure helps even myself to do clean reviews and avoid 2h sessions that I have to throw away. Really easier to adjust only what’s wrong at each step. It works surprisingly well

anbende 2 days ago

Here’s how I do the same thing, just with a slightly different wrapper: I’m running my own stepwise runtime where agents are plugged into defined slots.

I’ll usually work out the big decisions in a chat pane (sometimes a couple panes) until I’ve got a solid foundation: general guidelines, contracts, schemas, and a deterministic spec that’s clear enough to execute without interpretation.

From there, the runtime runs a job. My current code-gen flow looks like this: 1. Sync the current build map + policies into CLAUDE|COPILOT.md 2. Create a fresh feature branch 3. Run an agent in “dangerous mode,” but restricted to that branch (and explicitly no git commands) 4. Run the same agent again—or a different one—another 1–2 times to catch drift, mistakes, or missed edge cases 5. Finish with a run report (a simple model pass over the spec + the patch) and keep all intermediate outputs inspectable

And at the end, I include a final step that says: “Inspect the whole run and suggest improvements to COPILOT.md or the spec runner package.” That recommendation shows up in the report, so the system gets a little better each iteration instead of just producing code.

I keep tweaking the spec format, agent.md instructions and job steps so my velocity improves over time.

--- To answer the original article's question. I keep all the run records including the llm reasoning and output in the run record in a separate store, but it could be in repo also. I just have too many repos and want it all in one place.

CompoundLoop 2 days ago

What store do you use for your run records? A separate git repo? or do you have some SQL lite db holding the records.

anbende 2 days ago

Hi there. Right now they are going to a separate git repo, yes. Like this:

local-governor/epics/e-epics/e014-clinical-domain-model/runs/run-e014-01-ops-catalog-20260302-173907-244c82

- Attempts

+ Steps

  - Step 1

  - Step 2

  - ...

  - Step 13

job_def.yaml

job_instance.json

changes_final.patch

run_report.md

improvement_suggestions.md

local-governor is my store for epics, specs, run records, schemas, contracts, etc. No logic, just files. I want all this stuff in a DB, but it's easier to just drop a file path into my spec runner or into a chat window (vscode chat or cli tool), but I'm tinkering with an alt version on a cloud DB that just projects to local files... shrug. I spend about as much time on tooling as actual features :)

RHSeeger 2 days ago

I do something similar - A full work description in markdown (including pointers to tickets, etc); but not in a file - A "context" markdown file that I have it create once the plan is complete... that contains "everything important that it would need to regenerate the plan" - A "plan" markdown file that I have it create once the plan is complete

The "context" file is because, sometimes, it turns out the plan was totally wrong and I want to purge the changes locally and start over; discussing what was done wrong with it; it gives a good starting point. That being said, since I came up with the idea for this (from an experience it would have been useful and I did not have it) I haven't had an experience where I needed it. So I don't know how useful it really is.

None of that ^ goes into the repo though; mostly because I don't have a good place to put it. I like the idea though, so I may discuss it with my team. I don't like the idea of hundreds of such files winding up in the main branch, so I'm not sure what the right approach is. Thank you for the idea to look into it, though.

Edit: If you don't mind going into it, where do you put the task-specific md files into your repo, presumably in a way that doesn't stack of over time and cause ... noise?

giancarlostoro 2 days ago

This is how I used to use Beads before I made GuardRails[0]. I basically iterate with the model, ask it to do market research, review everything it suggests, and you wind up with a "prompt" that tells it what to do and how to work that was designed by the model using its own known verbiage. Having learned about how XML could be used to influence Claude I'm rethinking my flow and how GuardRails behaves.

[0]: https://giancarlostoro.com/introducing-guardrails-a-new-codi...

8note 2 days ago

the real question is when peer feedback and review happens.

is making the project file collaborative between multiple engineers? the plan file?

ive tried some variants of sharing different parts but it feels like ots almost water effort if the LLM then still goes through multiple iterations to get whats right, the oroginal plan and project gets lost a bit against the details of what happened in the resulting chat

adam_patarino 2 days ago

You check the plan files into git? Don’t you end up with dozens of md files?

I’ve been copying and pasting the plan into the linear issue or PR to save it, but keep my codebase clean.

thearn4 2 days ago

Yeah I had the same question. I suppose you could put the project+plan text into the commit message?

winwang 2 days ago

Interesting! I actually split up larger goals into two plan files: one detailed plan for design, and one "exec plan" which is effectively a build graph but the nodes are individual agents and what they should do. I throw the two-plan-file thing into a protocol md file along with a code/review loop.

the-grump 3 days ago

Stealing this brilliant idea. Thank you for sharing!

jedberg 2 days ago

I wish I could say I came up with it, but it's just a small variation on something I saw here on HN!

peyton 3 days ago

For big tasks you can run the plan.md’s TODOs through 5.2 pro and tell it to write out a prompt for xyz model. It’ll usually greatly expand the input. Presumably it knows all the tricks that’ve been written for prompting various models.

odiroot 2 days ago

How do you use your agent effectively for executing such projects in bigger brownfield codebases? It's always a balance between the agent going way too far into NIH vs burning loads and loads of tokens for the initial introspection.

tlb 2 days ago

Do you clear the file and use the same name for the next commit? Or create a new directory with a plan.md for each set of changes?

stackghost 3 days ago

>I then iterate on that plan.md with the AI until it's what I want.

Which tools/interface are you using for this? Opencode/claude code? Gas town?

StrangeSound 3 days ago

I find that Antigravity is really good for this. You can comment on the plan documents in-line.

d1sxeyes 2 days ago

Best feature of Antigravity

anshumankmr 3 days ago

While I have not commited my personal mind map, I just had Claude Code write it down for me. Plus I have a small Claude.MD, copilots-innstructions.md that are mentioning the various intricacies of what I am working on so the agent knows to refer to that file.

jedberg 2 days ago

I'm using the Claude desktop app and vi at the moment. But honestly I would probably do better with a more modern editor with native markdown support, since that's mostly what I'm writing now.

fhub 2 days ago

I do something similar but I get Claude to review Codex every step of the way and feed it back (or visa versa depending on day)

jedberg 2 days ago

My next step was to add in having another LLM review Claude's plans. With a few markdown artifacts it should be easy for the other LLM to figure it out and make suggestions.

vorticalbox 2 days ago

you may like openspec[0]

[0] https://openspec.dev/

iainmck29 2 days ago

is this not what entire.io is doing? Was founded by the old Github CEO Thomas Dohmke

plsft 2 days ago

Yes, when I first saw this, its exactly what I thought of.

moderation 2 days ago

No mention of Agent Trace [0] yet. Interestingly, Entire are not supporting Agent Trace [1]

0. https://agent-trace.dev/

1. https://github.com/entireio/cli/issues/386

esafak 2 days ago

Their response seems reasonable.

Bombthecat 2 days ago

Then you might like to look into automaker.

matkoniecz 2 days ago

I do the same, but put it as a comment on top of generated file.

(So far I have not used LLMs to generate code larger than fitting in one file.)

Overall idea is that I modify and tweak prompt, and keep starting new LLM sessions and dispose of old ones.

ryanmcl 2 days ago

[dead]

827a 3 days ago

IMO: This might be a contrarian opinion, but I don't think so. Its much the same problem as asking, for example, if every single line you write, or every function, becomes a commit. The answer to this granularity is, much like anything, you have to think of the audience: Who is served by persisting these sessions? I would suspect that there is little reason why future engineers, or future LLMs, would need access to them; they likely contain a significant amount of noise, incorrect implementations, and red herrings. The product of the session is what matters.

I do think there's more value in ensuring that the initial spec, or the "first prompt" (which IME is usually much bigger and tries to get 80% of the way there) is stored. And, maybe part of the product is an LLM summary of that spec, the changes we made to the spec within the session, and a summary of what is built. But... that could be the commit message? Or just in a markdown file. Or in Notion or whatever.

arppacket 3 days ago

While it's noisy and complicated for humans to read through, this session info is primarily for future AI to read and use as additional input for their tasks.

We could have LLMs ingest all these historical sessions, and use them as context for the current session. Basically treat the current session as an extension of a much, much longer previous session.

Plus, future models might be able to "understand" the limitations of current models, and use the historical session info to identity where the generated code could have deviated from user intention. That might be useful for generating code, or just more efficient analysis by focusing on possible "hotspots", etc.

Basically, it's high time we start capturing any and all human input for future models, especially open source model development, because I'm sure the companies already have a bunch of this kind of data.

woctordho 2 days ago

That's exactly one of the reasons I've been archiving the sessions using DataClaw. The sessions can contain more useful information than the comments for humans.

[0] https://github.com/peteromallet/dataclaw

staticassertion 2 days ago

TBH I don't think it's worth the context space to do this. I'm skeptical that this would have any meaningful benefits vs just investing in targeted docs, skills, etc.

I already keep a "benchmarks.md" file to track commits and benchmark results + what did/ did not work. I think that's far more concise and helpful than the massive context that was used to get there. And it's useful for a human to read, which I think is good. I prefer things remain maximally beneficial to both humans and AI - disconnects seem to be problematic.

arppacket 2 days ago

Might not be worth it now, but might be in future. Not just for future LLMs, but future AI architectures.

I don't think the current transformers architecture is the final stop in the architectural breakthroughs we need for "AGI" that mimics human thought process. We've gone through RNN, LSTM, Mamba, Transformers, with an exponentially increasing amounts of data over the years. If we want to use similar "copy human sequences" approaches all the way to AGI, we need to continuously record human thoughts, so to speak (and yes, that makes me really queasy).

So, persisting the session, that's already available in a convenient form for AI, is also about capturing the human reasoning process during the session, and the sometimes inherent heuristics therein. I agree that it's not really useful for humans to read.

staticassertion 2 days ago

I just don't really see the point in hedging like that tbh. I think you could justify almost anything on "it could be useful", but why pay the cost now? Eh.

aiisjustanif 11 hours ago

Optimizing and over-engineering to soon has gone out the window

JeremyNT 2 days ago

But AI can just read the diff. The natural language isn't important.

serial_dev 2 days ago

Or just "write a good commit message based on our session, pls", then both humans and llms can use it.

ZeroGravitas 2 days ago

Similarly, git logs of existing human code seem to be a good source of info that llms don't look at unless explicitly prompted to do so.

arppacket 2 days ago

Right now, it might not be worth the cost. That might change in future so that they consider it by default?

JustFinishedBSG 2 days ago

> While it's noisy and complicated for humans to read through, this session info is primarily for future AI to read and use as additional input for their tasks.

Context rot is very much a thing. May still be for future agents. Dumping tens/hundreds of thousand of trash tokens into context very much worsen the performance of the agent

jfoster 2 days ago

Future AIs can probably infer the requirements better than humans can write them.

nsonha 2 days ago

It's just noise for AI too. There is no reason to be lazy with context management when you can simply ask the AI to write the summary of the session. But even that is hardly useful when AI can just read the source of truth which is the code and committed docs

eru 3 days ago

> Its much the same problem as asking, for example, if every single line you write, or every function, becomes a commit.

Hmm, I think that's the wrong comparison? The more useful comparison might be: should all your notes you made and dead ends you tried become part of the commit?

panarky 3 days ago

When a human writes the code should all their slack messages about the project be committed into the repo?

blharr 3 days ago

Ideally, yes? Or a reference ticket number pointing to that discussion

The main limitation is the human effort to compile that information, but if the LLM already has the transcript ready, its free

woctordho 2 days ago

Ideally, yes. Although Slack is a vendor lock-in and we need a better platform to archive the sessions.

woctordho 2 days ago

Here is a recent example that lack of archived discussion causes problem: https://github.com/triton-lang/triton/issues/9539

eru 2 days ago

You can scrape slack when you make the commit, and archive that.

fragmede 2 days ago

That would be amazing! In the moment, it's a lot of noise, but say you're trying to figure out a bit of code that Greg wrote four years ago and oh btw he's no longer with the company. Having access to his emails and slack would be amazing context to try reverse engineer and figure out whytf he did what he did. Did he just pick a thing and run with it, so I can replace it and not worry about it, or was it a very intentional choice and do not replace, because everything else will break?

mocamoca 3 days ago

In some cases this is what I ask from my juniors. Not for every commit, but during some specific reviews. The goal is to coach them on why and how they got a specific result.

adithyassekhar 2 days ago

What is a junior? I don't see it in claude.

kubanczyk 2 days ago

It's how a middle manager can improve its standing, so the Junior will be a thing in bigger orgs for quite a while.

adithyassekhar 2 days ago

Both the manager and junior are a cost center for the company tbh if there are fewer employees to manage. Already seeing it here on this side of the pond: https://www.reddit.com/r/developersIndia/comments/1rinv3z/ju...

kubanczyk 21 hours ago

Companies (C-suites) do not actually want for their worker pool (humans + agents) to stay constant in time, there is no reason for it to stay constant in time. C-suites have very different worries.

And "cost center" is a lie from Outsourcing Era, forget about it.

rzerowan 3 days ago

This is a central problem that weve already seen proliferate wildly in Scientific research , and currently if the same is allowed to be embedded in foundational code. The future outlook would be grim.

Replication crisis[1].

Given initial conditions and even accounting for 'noise' would a LLm arrive at the same output.It should , for the same reason math problems require one to show their working. Scientific papers require the methods and pseudocode while also requireing limitations to be stated.

Without similar guardrails , maintainance and extension of future code becomes a choose your own adventure.Where you have to guess at the intent and conditions of the LLM used.

[1] https://www.ipr.northwestern.edu/news/2024/an-existential-cr...

827a 3 days ago

Agentic engineering is fundamentally different, not just because of the inherent unpredictability of LLMs, but also because there's a wildly good chance that two years from now Opus 4.6 will no longer even be a model anyone can use to write code with.

majormajor 3 days ago

You can leave commit messages or comments without spamming your history with every "now I'm inspecting this file..." or "oops, that actually works differently than I expected" transcript.

In fact, I'd wager that all that excess noise would make it harder to discern meaningful things in the future than simply distilling the meaningful parts of the session into comments and commit messages.

AlexCoventry 3 days ago

IMO, you should do both. The cost of intellectual effort is dropping to zero, and getting an AI to scan through a transcript for relevant details is not going to cost much at all.

devmor 3 days ago

Those messages are part of the linguistic context used to generate the code, though. Don’t confuse them for when humans (or human written programs) display progress messages.

If they aren’t important for your specific purposes, you can summarize them with an LLM.

JustFinishedBSG 2 days ago

> for the same reason math problems require one to show their working.

We don't put our transitional proofs in papers, only the final best one we have. So that analogy doesn't work.

For every proof in a paper there is probably 100 non-working / ugly sketches or just snippets of proofs that exist somewhere in a notebook or erased on a blackboard.

veunes 2 days ago

Even if you pin the seed and spin up your own local LLM, changes to continuous batching at the vLLM level or just a different CUDA driver version will completely break your bitwise float convergence. Reproducibility in ML generation is a total myth, in prod we only work with the final output anyway

itemize123 3 days ago

but we've been doing the same without llm. what're the new pieces which llm would bring in?

rzerowan 3 days ago

with normal practice , say if im reading through the linux source for a particular module.Id be able to refernce mailing lists and patchsets which by convention have to be human parsable/reviewable.Wit the history/comments/git blame etc putting in ones headspace the frame of reference that produced it.

Muromec 3 days ago

There is some potential value for the audit if you work in a special place where you are sworn in and where transparency is important, but who gonna read all of that and how do you even know that the transcript corresponds to the code if the committer is up to something

solarkraft 3 days ago

I agree that probably not everything should be stored - it’s too noisy. But the reason the session is so interesting is precisely the later part of the conversation - all the corrections in the details, where the actual, more precise requirements crystallize.

insin 3 days ago

AKA the code. You're all talking about the code.

medstrom 2 days ago

The prompt is the code :) The code is like a compiled binary. How long until we put the prompts in `src/` and the code in `bin/`, I wonder...

kubanczyk 2 days ago

I call out false dilemma. OP probably defines "code" as one of the languages precise enough to be suited for steering Turing machines. Thus, "code" is not the opposite of "prompt". They are apples and oranges.

Lawyers can code in English, but it is not to layperson's advantage, is it?

And for example, if you prompt for something to frobnicate biweekly, there is no intelligence today, and there will never be, to extract from it whether you want the Turing machine to act twice a week or one per two weeks. It's a deficiency of language, not of intelligence.

solarkraft 2 days ago

Not at all, unless it contains very thorough reasoning comments (which arguably it should). The code is only an artifact, a lot of which is incidental and flexible. The prompts contain the actual constraints.

whywhywhywhy 2 days ago

People are trying to retain value as their value is being evaporated.

slashdave 3 days ago

Then just summarize the final requirements

solarkraft 2 days ago

That’s what I do! I think it works well and helps future agents a lot in understanding why the codebase is the way it is. I do have to oversee the commit messages, but it does avoid a lot of noise and maybe it’s a normal part of HITL development.

lsaferite 2 days ago

If it's non-trivial work, have the Agent distill it down to an ADR.

JeremyNT 2 days ago

I think this too. I use the initial spec from the issue tracker as the prompt and work from there.

The missteps the agent takes and the nudging I do along the way are ephemeral, and new models and tooling will behave differently.

If you have the original prompt and the diff you have everything you need.

stackghost 3 days ago

LLM session transcripts as part of the commit is a neat idea to consider, to be sure, but I know that I damn well don't want to read eight pages of "You're absolutely right! It's not a foo. It's a bar" slop (for each commit no less!) when I'm trying to find someone to git blame.

The solution is as it always has been: the commit message is where you convey to your fellow humans, succinctly and clearly, why you made the commit.

I like the idea of committing the initial transcript somewhere in the docs/ directory or something. I'll very likely start doing this in my side projects.

matchagaucho 3 days ago

For me, it’s about preserving optionality.

If I can run resume {session_id} within 30 days of a file’s latest change, there’s a strong chance I’ll continue evolving that story thread—or at least I’ve removed the friction if I choose to.

majormajor 3 days ago

It seems unlikely that a file that hasn't changed in 30 days in an environment with a lot of "agents" cranking away on things is going to be particularly meaningful to revisit with the context from 30 days ago, vs using new context with everything that's been changed and learned since then.

xlii 2 days ago

> Its much the same problem as asking, for example, if every single line you write, or every function, becomes a commit.

As a huge fan of atomic commits I'd say that smallest logical piece should be a commit. I never seen "intention-in-a-commit", i.e. multiple changes with overarching goal influence reviews. There's usually some kind of ticket that can be linked to the code itself if needed.

D-Machine 3 days ago

First N prompts is a good / practical heuristic for something worth storing (whether N = 1 or greater).

wickedsight 2 days ago

> Who is served by persisting these sessions? I would suspect that there is little reason why future engineers, or future LLMs, would need access to them

I disagree. When working on legacy code, one of my biggest issues is usually the question 'why is this the way it is?' Devs hate documentation, Jira often isn't updated with decisions made during programming, so sometimes you just have to guess why 'wait(500)' or 'n = n - 1' are there.

If it was written with AI and the conversation history is available, I can ask my AI: 'why is this code here?', which would often save me a ton of time and headache when touching that code in the future.

notedbrew 3 days ago

You ignore the reality of vibe coding. If someone just prompts and never reads the code and tests the result barely, then the prompts can be a valuable insight.

But I am not rooting for either, just saying.

refactor_master 3 days ago

If A vibes, and B is overwhelmed with noise, how does B reliably go through it? If using AI, this necessarily faces the same problems that recording all A's actions was trying to solve in the first place, and we'd be stuck in a never-ending cycle.

We could also distribute the task to B, C, D, ... N actors, and assume that each of them would "cover" (i.e. understand) some part of A's output. But this suddenly becomes very labor intensive for other reasons, such as coordination and trust that all the reviewers cover adequately within the given time...

Or we could tell A that this is not a vibe playground and fire them.

dang 3 days ago

I floated that idea a week ago: https://news.ycombinator.com/item?id=47096202, although I used the word "prompts" which users pointed out was obsolete. "Session" seems better for now.

The objections I heard, which seemed solid, are (1) there's no single input to the AI (i.e. no single session or prompt) from which such a project is generated,

(2) the back-and-forth between human and AI isn't exactly like working with a compiler (the loop of source code -> object code) - it's also like a conversation between two engineers [1]. In the former case, you can make the source code into an artifact and treat that as "the project", but you can't really do that in the latter case, and

(3) even if you could, the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't add much value.

At the same time, people have been submitting so many Show HNs of generated projects, often with nothing more than a generated repo with a generated readme. We need a better way of processing these because treating them like old-fashioned Show HNs is overwhelming the system with noise right now [2].

I don't want to exclude these projects, because (1) some of them are good, (2) there's nothing wrong with more people being able to create and share things, (3) it's foolish to fight the future, and (4) there's no obvious way to exclude them anyhow.

But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.

So, community: what should we do?

[1] this point came from seldrige at https://news.ycombinator.com/item?id=47096903 and https://news.ycombinator.com/item?id=47108653.

YoumuChan makes a similar point at https://news.ycombinator.com/item?id=47213296, comparing it to Google search history. The analogy is different but the issue (signal/noise ratio) is the same.

[2] Is Show HN dead? No, but it's drowning - https://news.ycombinator.com/item?id=47045804 - Feb 2026 (422 comments)

amarant 3 days ago

My current thinking is based on boris tanes[1] formalised method of coding with Claude code. I commit the research and plan.md files as they are when I finally tell Claude to implement changes in code. This becomes a living lexicon of the architecture and every feature added. A very slight variation I do from Boris's method is that I prefix all my research and plan .md filenames with the name of the feature. I can very quickly load relevant architecture into context by having Claude read a previous design document instead of analysing the whole code base. I'll take pieces I think are relevant and tell Claude to base research from those design documents.

[1] https://boristane.com/blog/how-i-use-claude-code/

majormajor 3 days ago

> But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.

IMO it's not the lack of context that makes them uninteresting. It's the fact that the bar for "this took effort and thought to make" has moved, so it's just a lot easier to make things that we would've considered interesting two years ago.

If you're asking HN readers to sift through additional commit history or "session transcripts" in order to decide if it's interesting, because there's a lot of noise, you've already failed. There's gonna be too much noise to make it worth that sifting. The elevator pitch is just gonna need to be that much different from "vibe coded thing X" in order for a project to be worth much.

sillysaurusx 3 days ago

Unfortunately Codex doesn’t seem to be able to export the entire session as markdown, otherwise I’d suggest encouraging people to include that in their Show HNs. It’s kind of nuts that it’s so difficult to export what’s now a part of the engineering process.

I don’t have anything against vibe coded apps, but what makes them interesting is to see the vibe coding session and all the false starts along the way. You learn with them as they explore the problem space.

dang 3 days ago

mthurman pointed me to https://static.simonwillison.net/static/2025/claude-code-mic... - is that what you have in mind?

sillysaurusx 3 days ago

Yeah! That’s great. Having those alongside vibe coded apps would make them way more interesting.

duggan 2 days ago

I've been tinkering away on one of these myself, https://rockstar.ninja. I expect there are a hundred others out there, going to be interesting to see what the end shape of these tools is.

esperent 3 days ago

I don't think it's hard to export, on the contrary its all already saved it your ~/.claude which so you could write up a tool to convert the data there to markdown.

woctordho 2 days ago

You can export it with DataClaw. By default it outputs jsonl and publishes to HuggingFace, but you can also do analysis locally with it.

maxbond 2 days ago

> So, community: what should we do?

My diagnosis is that the friction that existed before (the effort to create a project) was filtering out low-effort projects and keeping the amount of submissions within the capacity the community to handle. Now that the friction is greatly reduced, there's more low-effort content and it's beyond the community's capacity (which is the real problem).

So there's two options: increase the amount of friction or increase the capacity. I don't think the capacity options are very attractive. You could add tags/categories to create different niches/queues. The most popular tags would still be overwhelmed but the more niche ones would prosper. I wouldn't mind that but I think it goes against the site's philosophy so I doubt you'll be interested.

So what I would propose is to create a heavier submission process.

- Make it so you may only submit 1 Show HN per week.

- Put it into a review queue so that it isn't immediately visible to everyone.

- Users who are eligible to be reviewers (maybe their account is at least a year old with, maybe they've posted to Show HN at least once) can volunteer to provide feedback (as comments) and can approve of the submission.

- If it gets approved by N people, it gets posted.

- If the submitter can't get the approvals they need, they can review the feedback and submit again next week.

High effort projects should sail through. Projects that aren't sufficently effortful or don't follow the Show HN guidelines (eg it's account walled) get the opportunity to apply more polish and try again.

A note on requirements for reviewers: A lot of the best comments come from people with old accounts who almost never post and so may have less than 100 karma. My interpretation is that these people have a lot of experience but only comment when they have an especially meaningful contribution. So I would suggest having requirements for account age (to make it more difficult to approve yourself from a sockpuppet) but being very flexible with karma.

tempestn 3 days ago

Why does the regular voting system fail here? Are there just too many Show HNs for people to process the new ones, so the good ones get lost in the noise?

dang 3 days ago

Yes I believe that's it.

th0ma5 3 days ago

[dead]

grey-area 3 days ago

1. Comments - Ban fully automated HN comments/accounts - can’t think of any reason to allow these or others to have to read them.

2. Require submissions which use GAI to have a text tag in title Show HN GAI would be fine for example - this would be a good first step and can be policed by readers mostly.

I do think point 1 is important to prevent fully automated voting rings etc.

Point 2 is preparation for some other treatment later - perhaps you could ask for a human written explanation on these ones?

I don’t think any complex or automated requirements are going to be enforceable or done so keep it simple. I also wonder whether show posts are enough - I’ve noticed a fair few blogspam posts using AI to write huge meandering articles.

airstrike 2 days ago

1. I think at a minimum we need a separate "Show HN" for AI posts, that people can filter out, so that users are not incentivized to spam Show HNs hoping to make it to the front page

2. Then that separate group, call it "Vibe HN", gets to decide what they find valuable through their own voting and flagging.

Some guidelines on what makes a good "Vibe HN" post would be helpful to nudge the community towards the things you're suggesting, but I think (1) cutting off self-promotion incentives given the low cost of creating software now and (2) allowing for self-moderation given the sheer number of submissions is the only tenable path

Lerc 2 days ago

From my perspective, I have two projects that I have considered [Show HN] posts for. One of those I have not yet posted because I have not yet completed writing up the process I used to construct it (a non-trivial project in an artifact). Without that commentary it falls into a different class, which i agree shouldn't be outright excluded, but is of less general interest. The other project I think some people would be interested in it just for what it is in itself, I just want to add a bit more to it.

Perhaps [Show HN] for things that have commentary or highlight a particular thing. It's a bit nebulous because it gets to be like Wikipedia's notability and is more of a judgement call.

But if that is backed up with a [Creations], simply for things that have been made that people might like or because you are proud of your achievement.

So if you write a little Chess engine, it goes under [Creations]. If it is a Chess engine in 1k, or written in BrainFuck, or has a discussion on how you did it, it goes under [Show HN]

[Creations] would be much less likely to hit the front page of course, but I think there might need a nudge to push the culture towards recognising that being on the front page should not be the goal.

For reference here are the two things, coming to a [Show HN] near you (maybe).

https://fingswotidun.com/PerfBoard/ (Just an app, Commentary would be the value.)

https://lerc.neocities.org/ (this is just neat (to a certain mind anyway), awaiting some more polish)

wging 3 days ago

Regarding the noise you mention, I wonder if memento's use of the git 'notes' feature is an acceptable way to contain or quarantine that noise. It might still not add much value, but at least it would live in a separate place that is easily filtered out when the user judges it irrelevant. Per the README of the linked repo,

> It runs a commit and then stores a cleaned markdown conversation as a git note on the new commit.

So it doesn't seem that normal commit history is affected - git stores notes specially, outside of the commit (https://git-scm.com/docs/git-notes).

In fact github doesn't even display them, according to some (two-year-old) blog posts I'm seeing. Not sure about other interfaces to git (magit, other forges), but git log is definitely able to ignore them (https://git-scm.com/docs/git-log#Documentation/git-log.txt--...).

This doesn't mean the saved artifacts would necessarily be valuable - just that, unlike a more naive solution (saving in commit messages or in some directory of tracked files) they may not get in the way of ordinary workflows aside from maybe bloating the repo to some degree.

mandel_x 3 days ago

You are 100% and that’s why I chose git notes. If you do not sync them you have no knowledge of their existence.

trailblaze 2 days ago

[dead]

bandrami 3 days ago

Plenty of commits link to mailing list discussions about the proposed change, maybe something like that, with an archive of LLM sessions?

pjc50 2 days ago

All the agentic AI projects remind me of "draw the rest of the owl": https://knowyourmeme.com/memes/how-to-draw-an-owl - there's a lot of steps missing.

Unlike many people, I'm on the trailing edge of this. Company is conservative about AI (still concerned about the three different aspects of IP risk) and we've found it not very good at embedded firmware. I'm also in the set of people who've been negatively polarized by the hype. I might be willing to give it another go, but what I don't see from the impressive Show HN projects (e.g. the WINE clone from last week) is .. how do you get those results?

esperent 3 days ago

> the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't really add that much value.

This is the major blocker for me. However, there might be value in saving a summary - basically the same as what you would get from taking meeting notes and then summarizing the important points.

killingtime74 3 days ago

Also the models change all the time and are not deterministic

tptacek 3 days ago

A starting point would be excluding Show HNs with generated READMEs, or that lack human-written explanations.

mandel_x 3 days ago

> people have been submitting so many Show HNs of generated projects

In this case, it was more of write the X language compiler using X. I had to prove to myself if keeping the session made sense, and what better way to do it than to vibe code the tool to audit vibe code.

I do get your point though

grayhatter 3 days ago

> So, community: what should we do?

> Is Show HN dead? No, but it's drowning

Is spam on topic? and are AI codegen bots part of the community?

To me, the value of Show HN was rarely the thing, it was the work and attention that someone put into it. AI bot's don't do work. (What they do is worth it's own word, but it's not the same as work).

> I don't want to exclude these projects, because (1) some of them are good,

Most of them are barely passable at best, but I say that as a very biased person. But I'll reiterate my previous point. I'm willing to share my attention with people who've invested significant amounts of their own time. SIGNIFICANT amounts, of their time, not their tokens.

> (2) there's nothing wrong with more people being able to create and share things

This is true, only in isolation. Here, the topic is more, what to do about all this new noise, (not; should people share things they think are cool). If the noise drowns out the signal, you're allowed that noise to ruin something that was useful.

> (3) it's foolish to fight the future

coward!

I do hope you take that as the tongue-in-cheek way I meant it, because I say it as a friend would; but I refuse to resign myself completely to fatalism. Fighting the future is different from letting people doing something different ruin the good thing you currently have. Sure electric cars are the future, but that's no reason to welcome them in a group that loves rebuilding classic hot rods.

> (4) there's no obvious way to exclude them anyhow.

You got me there. But then, I just have to take your word for it, because it's not a problem I've spent a lot of time figuring out. But even then, I'd say it's a cultural problem. If people ahem, in a leadership position, comment ShowHN is reserved for projects that took a lot of time investment, and not just ideas with code... eventually the problem would solve itself, no? The inertia may take some time, but then this whole comment is about time...

I know it's not anymore, but to me, HN still somehow, feels a niche community. Given that, I'd like to encourage you to optimize for the people who want to invest time into getting good at something. A very small number of these projects could become those, but trying to optimize for best fairness to everyone, time spent be damned... I believe will turn the people who lift the quality of HN away.

d--b 2 days ago

The issue is that there is more HN submissions than the community is able to process. But you could say the same of the front page, which is mostly a fairly small sample of the good stuff that go through /new

So you could treat Show HN as the same. Like what gets floated on /show is only a small sample of the good stuff in /shownew and be fine with the idea that a lot of the good Show HN just slip through the crack. Which seems to me like the best alternative. Possibly with a /showpool maybe?

You could split Show HN into categories, but you'd have done it by now if you thought it a good idea.

You could also rate Show HN submissions algorithmically trying to push for those projects that have been around longer and that look like more effort has been put into them, but I guess that's kind of hard.

Or you'd have to hire actual people to pre-sort the submissions, and gut all the ones that are not up-to-par. In fact, if there was a human-based approval system for new Show HN, you'd possibly get a lot fewer submissions and more qualitative ones, which in itself would make the work of sorting through them simpler.

adampunk 2 days ago

Where is this deluge tho? In the last week how many have we seen hit the front page? A dozen? That Mathematica clone, the ZX spectrum emulator, the poorly named rtk, and…like 1-2 more are what I can remember from the last week that got popular.

That’s…pretty manageable.

d--b 2 days ago

They’re in /shownew. the last 30 Show HN submissions were sent in the last two hours. I think that’s a lot more than what we used to see.

adampunk 18 hours ago

OK, so I just stopped in there and I saw two projects that looked like they were generated by AI. The real flood seems to be scam blog posts.

I guess from reading about it, It would seem like this is a four-alarm fire. But I don’t see that when I go look around.

jgraham 2 days ago

> it's foolish to fight the future

And yet, the premise of the question assumes that it's possible in this case.

Historically having produced a piece of software to accomplish some non-trivial task implied weeks, months, or more of developing expertise and painstakingly converting that expertise into a formulation of the problem precise enough to run on a computer.

One could reasonably assume that any reasonable-looking submission was in fact the result of someone putting in the time to refine their understanding of the problem, and express it in code. By discussing the project one could reasonably hope to learn more about their understanding of the problem domain, or about the choices they made when reifying that understanding into an artifact useful for computation.

Now that no longer appears to be the case.

Which isn't to say there's no longer any skill involved in producing well engineered software that continues to function over time. Or indeed that there aren't classes of software that require interesting novel approaches that AI tooling can't generate. But now anyone with an idea, some high level understanding of the domain, and a few hundred dollars a month to spend, can write out a plan can ask an AI provider to generate them software to implement that plan. That software may or may not be good, but determining that requires a significant investment of time.

That change fundamentally changes the dynamics of "Show HN" (and probably much else besides).

It's essentially the same problem that art forums had with AI-generated work. Except they have an advantage: people generally agree that there's some value to art being artisan; the skill and effort that went into producing it are — in most cases — part of the reason people enjoy consuming it. That makes it rather easy to at least develop a policy to exclude AI, even if it's hard to implement in practice.

But the most common position here is that the value of software is what it does. Whilst people might intellectually prefer 100 lines of elegant lisp to 10,000 lines of spaghetti PHP to solve a problem, the majority view here is that if the latter provides more economic value — e.g. as the basis of a successful business — then it's better.

So now the cost of verifying things for interestingness is higher than the cost of generating plausibly-interesting things, and you can't even have a blanket policy that tries to enforce a minimum level of effort on the submitter.

To engage with the original question: if one was serious about extracting the human understanding from the generated code, one would probably take a leaf from the standards world where the important artifact is a specification that allows multiple parties to generate unique, but functionally equivalent, implementations of an idea. In the LLM case, that would presumably be a plan detailed enough to reliably one-shot an implementation across several models.

However I can't see any incentive structure that might cause that to become a common practice.

adampunk 2 days ago

>a plan detailed enough to reliably one-shot an implementation across several models.

What. Why should this be an output? Why if I make a project should I be responsible for also making this, an entirely different and much more difficult and potentially impossible project? If I come and show you a project that require required thousands of sessions to make I also have to show you how to one shot it in multiple models? Does that even make sense?

jgraham 23 hours ago

To be clear: I don't think it will happen.

But the point of comparison is something like the HTML specification. That's supposed to be a document that is detailed enough about how to create an implementation that multiple different groups can produce compatible implementations without having any actual code in common.

In practice it still doesn't quite work: the specification has to be supplemented with testsuites that all implementations use, and even then there often needs to be a feedback loop where new implementations find new ambiguities or errors, and the specification needs to be updated. Plus implementors often "cheat" and examine each other's behaviour or even code, rather than just using the specification.

Nevertheless it's perhaps the closest thing I'm familiar with to an existing practice where the plan is considered canonical, and therefore worth thinking about as a model for what "code as implementation detail" would entail in other situations.

adampunk 18 hours ago

I think the looping part is what stops this from being a practical solution. If we imagine that the actual code required some iteration in order to put down, I don’t know that we could say there is a one shot equivalent without testing that. Sometimes there may not even be an equivalent.

It’s possible that the solution to code being implementation detail is to be less precious about it and not more. I don’t really have an answer here and I don’t think anyone does because it’s all very new and it is hard to manage.

There’s also a pretty normal way in which this is going to diverge and perhaps already has. Developers are building local bespoke skills just like they used to develop and still do local bespoke code to make their work more efficient. They may be able to do something that you or I cannot using the same models—-there’s no way to homologize their output. It would be like asking someone to commit their dot files alongside the project output. Regardless of whether or not it was the right thing to do no one would do it.

acedTrex 3 days ago

> (2) there's nothing wrong with more people being able to create and share things

There is very clearly many things wrong with this when the things being shown require very little skill or effort.

dang 3 days ago

That is by no means all of these projects. I'm not interested in a circle-the-wagons crackdown because it won't work (see "it's foolish to fight the future" above), and because we should be welcoming and educating new users in how to contribute substantively to HN.

imiric 3 days ago

Which users?

The future you're concerned with defending includes bots being a large part of this community, potentially the majority. Those bots will not only submit comments autonomously, but create these projects, and Show HN threads. I.e. there will be no human in the loop.

This is not something unique to this forum, but to the internet at large. We're drowning in bot-generated content, and now it is fully automated.

So the fundamental question is: do you want to treat bots as human users?

Ignoring the existential issue, whatever answer you choose, it will inevitably alienate a portion of existing (human) users. It's silly I have to say this, but bots don't think, nor "care", and will keep coming regardless.

To me the obvious answer is "no". All web sites that wish to preserve their humanity will have to do a complete block of machine-generated content, or, at the very least, filter and categorize it correctly so that humans who wish to ignore it, can. It's a tough nut to crack, but I reckon YC would know some people capable of tackling this.

It's important to note that this state of a human driving the machine directly is only temporary. The people who think these are tools as any other are sorely mistaken. This tool can do their minimal effort job much more efficiently, cheaper, and with better results, and it's only a matter of time until the human is completely displaced. This will take longer for more complex work, of course, but creating regurgitated projects on GitHub and posting content on discussion forums is a very low bar activity.

lelanthran 2 days ago

> That is by no means all of these projects. I'm not interested in a circle-the-wagons crackdown because it won't work (see "it's foolish to fight the future" above), and because we should be welcoming and educating new users in how to contribute substantively to HN.

Is it really that difficult to identify bot accounts right now? Or people who create a HN account only to post their project?

That seems like low-hanging fruit that should be picked immediately.

CuriouslyC 3 days ago

Taking a good picture requires very little effort once you've found yourself in the right place. You gonna shit on Ansel Adams?

newswasboring 2 days ago

Why exactly is the skill level required for something a gating parameter?

rfw300 3 days ago

Why should it be? The agent session is a messy intermediate output, not an artifact that should be part of the final product. If the "why" of a code change is important, have your agent write a commit message or a documentation file that is polished and intended for consumption.

addcn 2 days ago

This reduces down to the problem of summarization - a quite difficult one. At commit time it’s difficult to know what questions readers will have. You can get close but never all the way there.

Pre AI when engineers couldn’t find the answer in commit messages or documentation they would ask the author “why” and that human would “compute” the summary on demand.

I think that’s what I expect to do with these agent sessions - I don’t want more markdown, I want to ask it questions on demand. Git AI (https://github.com/git-ai-project/git-ai) uses the prompts that way. I think that model will win out. Save sessions. Read/ask questions relevant to the current agent’s work.

On asking peers. This is regrettably on the way out today - I’ll ask engineers about complex code they generated and they can’t give good answers. I think it’s because it all happened so fast — they didn’t sit with the problem for 48 hours. So even if they steered the agent thoughtfully it’s hard to remember all the decisions they made a week later.

D-Machine 3 days ago

It should be a distillation of the session and/or the prompts, at bare minimum. No, it should not include e.g. research-type questions, but it should include prompts that the user wrote after reading the answers to those research-type questions, and perhaps some distillation of the links / references surfaced during the research.

Prompts probably should be distilled / summarized, especially if they are research-based prompts, but code-gen prompts should probably be saved verbatim.

Reproducibility is a thing, and though perfect reproducibility isn't desirable, something needs to make up for the fact that vibe-coding is highly inscrutable and hard to review. Making the summary of the session too vague / distilled makes it hard to iterate and improve when / if some bad prompts / assumptions are not documented in any way.

tpmoney 3 days ago

You have the source code though. That is the "reproducibility" bit you need. What extra reproducibility does having the prompts give you? Especially given that AI agents are non-deterministic in the first place. To me the idea that the prompts and sessions should be part of the commit history is akin to saying that the keystroke logs and commands issued to the IDE should be part of the commit history. Is it important to know that when the foo file was refactored the developer chose to do it by hand vs letting the IDE do it with an auto-refactor command vs just doing a simple find and replace? Maybe it is for code review purposes, but for "reproducibility" I don't think it is. You have the code that made build X and you have the code that made build X+1. As long as you can reliably recreate X and X+1 from what you have in the code, you have reproducibility.

D-Machine 3 days ago

> You have the source code though. That is the "reproducibility" bit you need.

I am talking about reproducing the (perhaps erroneous) logic or thinking or motivations in cases of bugs, not reproducing outputs perfectly. As you said, current LLM models are non-deterministic, so we can't have perfect reproducibility based on the prompts, but, when trying to fix a bug, having the basic prompts we can see if we run into similar issues given a bad prompt. This gives us information about whether the bad / bugged code was just a random spasm, or something reflecting bad / missing logic in the prompt.

> Is it important to know that when the foo file was refactored the developer chose to do it by hand vs letting the IDE do it with an auto-refactor command vs just doing a simple find and replace? Maybe it is for code review purposes, but for "reproducibility" I don't think it is.

I am really using "reproducibility" more abstractly here, and don't mean perfect reproducibility of the same code. I.e. consider this situation: "A developer said AI wrote this code according to these specs and prompt, which, according to all reviewers, shouldn't produce the errors and bad code we are seeing. Let's see if we can indeed reproduce similar code given their specs and prompt". The less evidence we have of the specifics of a session, the less reproducible their generated code is, in this sense.

xmcqdpt2 2 days ago

It's not reproducible though.

Even with the exact same prompt and model, you can get dramatically different results especially after a few iterations of the agent loop. Generally you can't even rely on those though: most tools don't let you pick the model snapshot and don't let you change the system prompt. You would have to make sure you have the exact same user config too. Once the model runs code, you aren't going to get the same outputs in most cases (there will be date times, logging timestamps, different host names and user names etc.)

I generally avoid even reading the LLM's own text (and I wish it produced less of it really) because it will often explain away bugs convincingly and I don't want my review to be biased. (This isn't LLM specific though -- humans also do this and I try to review code without talking to the author whenever possible.)

newswasboring 2 days ago

You are talking about documenting the intent of a piece of software if I understand correctly. But isn't that what READMEs and comments are for?

tpmoney 2 days ago

> I am talking about reproducing the (perhaps erroneous) logic or thinking or motivations in cases of bugs

But "to what purpose" is where this all loses me. What do you gain from seeing what was said to the AI that generated the bug? To me it feels like these sorts of things will fall into 3 broad categories:

1) Underspecified design requirements

2) General design bugs arising from unconsidered edge cases

3) AI gone off the rails failures

For items in category 1, these are failures you already know how to diagnose with human developers and your design docs should already be recorded and preserved as part of your development lifecycle and you should be feeding those same human readable design documents to the AI. The session output here seems irrelevant to me as you have the input and you have the output and everything in between is not reproducible with an AI. At best, if you preserve the history you can possibly get a "why" answer out of it in the same way that you might ask a dev "why did you interpret A to mean B", but you're preserving an awful lot of noise and useless data int the hopes that the AI dropped something in it's output that shows you someplace your spec isn't specific or detailed enough that a simple human review of the spec wouldn't catch anyway once the bug is known.

For category 2, again this is no different from the human operator case and there's no value that I can see in confirming in the logs that the AI definitely didn't consider this edge case (or even did consider it and rejected it for some erroneous reason). AI models in the forms that folks are using them right now are not (yet? ever?) capable of learning from a post mortem discussion about something like that to improve their behavior going forward. And its not even clear to me that even if they were, you would need the output of the session as opposed to just telling the robot "hey at line 354 in foo.bar you assumed that A would never be possible, but no place in the code before that point asserts it, so in the future you should always check for the possibility of A because our system can't guarantee it will never occur."

And as for category 3, since it's going off the rails, the only real thing to learn is whether you need a new model entirely or if it was a random fluke, but since you have the inputs used and you know they're "correct", I don't see what the session gives you here either. To validate whether you need a new model, it seems that just feeding your input again and seeing if you get a similar "off the rails" result is sufficient. And if you don't get another "off the rails" result, I sincerely doubt your model is going to be capable of adequately diagnosing its own internal state to sort out why you got that result 3 months ago.

xigoi 2 days ago

The source code is whatever is easiest for a human to understand. Committing AI-generated code without the prompts is like committing compiler-generated machine code.

e3bc54b2 3 days ago

> It should be a distillation of the session and/or the prompts, at bare minimum.

Huh, I thought that's what commit message is for.

D-Machine 3 days ago

I mean, sure, a good, detailed commit message is perfectly fine to me in place of the prompts / a session distillation. But I am not holding my breath for vibe-coders to properly review their code and make such a commit message. But, if they, do, great! No need for prompt / session details.

hatmanstack 3 days ago

Completely agree. Until recently I only let LLMs write my commit messages, but I've found that versioning the plan files is the better artifact, it preserves agentic decisions and my own reasoning without the noise.

My current workflow: write a detailed plan first, then run a standard implement -> review loop where the agent updates the plan as errors surface. The final plan doc becomes something genuinely useful for future iterations, not just a transcript of how we got there.

sunir 3 days ago

In my case I have set up the agent is the repo. The repo texts compose the agent’s memory. Changes to the repo require the agent to approve.

Repos also message each other and coordinate plans and changes with each other and make feature requests which the repo agent then manages.

So I keep the agents’ semantically compressed memories as part of the repo as well as the original transcripts because often they lose coherence and reviewing every user submitted prompt realigns the specs and stories and requirements.

aspenmartin 3 days ago

post mortems / bug hunting -- pinpointing what part of the logic was to blame for a certain problem.

monster_truck 3 days ago

this is what granular commits are for, the kilobytes long log of claude running in circles over bullshit isn't going to help anyone

ahupp 3 days ago

I think the parent comment is saying “why did the agent produce this big, and why wants it caught”, which is a separate problem from what granular commits solve, of finding the bug in the first place.

xmcqdpt2 2 days ago

There is no "why." It will give reasons but they are bullshit too. Even with the prompt you may not get it to produce the bug more than once.

If you sell a coding agent, it makes sense to capture all that stuff because you have (hopefully) test harnesses where you can statistically tease out what prompt changes caused bugs. Most projects wont have those and anyway you don't control the whole context if you are using one of the popular CLIs.

aspenmartin 2 days ago

If I have a session history or histories, I can (and have!) mine them to pinpoint where an agent either did not implement what it was supposed to, or understand who asked for a certain feature an why, etc. It complements commits, sessions are more like a court transcript of what was said / claimed (session) and then you can compare that to what was actually done (commits).

causal 2 days ago

Then look at the code, the session will only confuse. To read an LLM's explanation is to anthropomorphize what will just be a probabilistic incident.

aspenmartin 2 days ago

no you look at the session to understand what the context was for the code change -- what did you _ask_ the llm to do? did it do it? where did a certain piece of logic go wrong? Session history has been immensely useful to me and it serves as an important documentation of the entire flow of the project. I don't think people should look at session histories at all unless they need to.

lacunary 3 days ago

but that takes more tokens and time. if you just save the raw log, you can always do that later if you want to consume it. plus, having the full log allows asking many different questions later.

AndrewKemendo 3 days ago

How’s it any different than a diff log?

xboxnolifes 3 days ago

Better question: how is it in any way similar?

AndrewKemendo 3 days ago

If you read the history of both and assuming that there’s good comments and documentation, it shows you the reasoning that went into the decision-making

onion2k 3 days ago

Conceptually this is very similar to the question of whether or not you should squash your commits. To the point that it's really the same question.

If you think you should squash commits, then you're only really interested in the final code change. The history of how the dev got there can go in the bin.

If you don't think you should squash commits then you're interested in being able to look back at the journey that got the dev to the final code change.

Both approaches are valid for different reasons but they're a source of long and furious debate on every team I've been on. Whether or not you should be keeping a history of your AI sessions alongside the code could be useful for debugging (less code debugging, more thought process debugging) but the 'prefer squash' developers usually prefer to look the existing code rather than the history of changes to steer it back on course, so why would they start looking at AI sessions if they don't look at commits?

All that said, your AI's memory could easily be stored and managed somewhere separately to the repo history, and in a way that makes it more easily accessible to the LLM you choose, so probably not.

mikepurvis 3 days ago

I've generally been in the squash camp but it's more out of a sense of wanting a "clean" and bisectable repo history. In a word where git (and git forges) could show me atomic merge commits but also let me seamlessly fan those out to show the internal history and iteration and maybe stuff like llm sessions, I'd be into that.

And yes, it's my understanding that mercurial and fossil do actually do more of this than git does, but I haven't actually worked on any projects using those so I can't comment.

globular-toast 3 days ago

This only works if the software is still crafted by a human and merely using AI as a tool. In that case the use of AI is similar to using editor macros or test-driven development. I don't need to see that process playing out in real time.

It's less clear to me if the software isn't crafted by a human at all, though. In that case I would prefer to see the prompt.

onion2k 3 days ago

I agree that fully agentic development will change things, but I don't know how. I'm still very much in the human-in-the-loop phase of AI where I want to understand and verify that it's not done anything silly. I care far more about the code that I'm deploying than the prompt that got me there and probably will for a long time. So will my prodsec team.

D-Machine 3 days ago

Appreciate this very sane take. The actual code always is more important than the intentions, and this is basically tautological.

When dealing with a particularly subtle / nuanced issue, knowing the intentions is still invaluable, but this is usually rare. How often AI code runs you into these issues is currently unclear, and constantly changing (and how often such issue are actually crucial depends heavily on the domain).

D-Machine 3 days ago

I think this is the right analogy, contrary to some other very poor ones in this thread. Yes, it is rare to really look at commit messages, but it can be invaluable in some cases.

With vibe-coding, you risk having no documentation at all for the reasoning (AI comments and tests can be degenerate / useless), but the prompts, at bare minimum, reveal something about the reasoning / motivation.

Whether this needs to be in git or not is a side issue, but there is benefit to having this available.

mikepurvis 3 days ago

Depending on the size it might make sense as a kind of commit metadata reference to external, like the signed-off-by field.

Chat-Session-Ref: claude://gjhgdvbnjuteshjoiyew

Perhaps that could also link out to other kinds of meeting transcripts or something too.

onion2k 3 days ago

That wouldn't be very portable. A benefit of committing to your history is that it lives with the code no matter where the code or the AI service you use goes.

mikepurvis 2 days ago

That's true. I was thinking of it as being more like how LFS works since presumably the LLM contexts could be large and you wouldn't necessarily want all of them on every clone.

sjkoelle 2 days ago

it depends how long of a leash you give it

yuvrajangads 2 days ago

The session itself is mostly noise. Half of it is the model going down wrong paths, backtracking, and trying again. Storing that alongside the commit is like saving your browser history next to your finished code.

What actually helps is a good commit message explaining the intent. If an AI wrote the code, the interesting part isn't the transcript, it's why you asked for it and what constraints you gave it. A one-paragraph description of the goal and approach is worth more than a 200-message session log.

I think the real question isn't about storing sessions, it's about whether we're writing worse commit messages because we assume the AI context is "somewhere."

D-Machine 3 days ago

Obviously yes, at least if not the prompts in the session, some simple / automated distillation of those prompts. Code generated by AI is already clearly not going to be reviewed as carefully as code produced by humans, and intentions / assumptions will only be documented in AI-generated comments to some limited degree, completely contingent on the prompt(s).

Otherwise, when fixing a bug, you just risk starting from scratch and wasting time using the same prompts and/or assumptions that led to the issue in the first place.

Much of the reason code review was/is worth the time is because it can teach people to improve, and prevent future mistakes. Code review is not really about "correctness", beyond basic issues, because subtle logic errors are in general very hard to spot; that is covered by testing (or, unfortunately, deployment surprises).

With AI, at least as it is currently implemented, there is no learning, as such, so this removes much of the value of code review. But, if the goal is to prevent future mistakes, having some info about the prompts that led to the code at least brings some value back to the review process.

EDIT: Also, from a business standpoint, you still need to select for competent/incompetent prompters/AI users. It is hard to do so when you have no evidence of what the session looked like. Also, how can you teach juniors to improve their vibe-coding if you can't see anything about their sessions?

tpmoney 3 days ago

> Obviously yes

I don't think this is obvious at all. We don't make the keystroke logs part of the commit history. We don't make the menu item selections part of the commit history. We don't make the 20 iterations you do while trying to debug an issue part of the commit history (well, maybe some people do but most people I know re-write the same file multiple times before committing, or rebase/squash intermediate commits into more useful logical commits. We don't make the search history part of the commit history. We don't make the discussion that two devs have about the project part of the commit history either.

Some of these things might be useful to preserve some of the time either in the commit history or along side it. For example, having some documentation for the intent behind a given series of commits and any assumptions made can be quite valuable in the future, but every single discussion between any two devs on a project as part of the commit history would be so much noise for very little gain. AI prompts and sessions seem to me to fall into that same bucket.

D-Machine 3 days ago

> well, maybe some people do but most people I know re-write the same file multiple times before committing, or rebase/squash intermediate commits into more useful logical commits

Right, agreed on this, we want a distillation, not documentation of every step.

> For example, having some documentation for the intent behind a given series of commits and any assumptions made can be quite valuable in the future, but every single discussion between any two devs on a project as part of the commit history would be so much noise for very little gain. AI prompts and sessions seem to me to fall into that same bucket.

Yes, documenting every single discussion is a waste / too much to process, but I do think prompts at least are pretty crucial relative to sessions. Prompts basically are the core intentions / motivations (skills aside). It is hard to say whether we really want earlier / later prompts, given how much context changes based on the early prompts, but having no info about prompts or sessions is a definite negative in vibe-coding, where review is weak and good documentation, comments, and commit messages are only weakly incentivized.

> Some of these things might be useful to preserve some of the time either in the commit history or along side it

Right, along side is fine to me as well. Just something has to make up for the fact that vibe-coding only appears faster (currently) if you ignore the fact it is weakly-reviewed and almost certainly incurring technical debt. Documenting some basic aspects of the vibe-coding process is the most basic and easy way to reduce these long-term costs.

EDIT: Also, as I said, information about the prompts quickly reveals competence / incompetence, and is crucial for management / business in hiring, promotions, managing token budgets, etc. Oh, and of course, one of the main purposes of code review was to teach. Now, that teaching has to shift toward teaching better prompting and AI use. That gets a lot harder with no documentation of the session!

Lyrkan 2 days ago

> Also, as I said, information about the prompts quickly reveals competence / incompetence, and is crucial for management / business in hiring, promotions, managing token budgets, etc.

I fail to see why you would need that kind of information to find out if someone is not competent. This really sounds like an attempt at crazy micro-management.

The "distillation" that you want already exists in various forms: the commit message, the merge request description/comments, the code itself, etc.

Those can (and should) easily be reviewed.

Did you previously monitor which kind of web searches developpers where doing when working on a feature/bugfix? Or asked them to document all the thoughts that they had while doing so?

adampunk 2 days ago

Just out of curiosity, what session size do you think we're talking about here?

voxleone 2 days ago

I’ve found a workflow that feels both structured and respectful of professional craft, especially in the context of this thread. I don’t just "vibe code" and let an LLM fill in the blanks. I use a classic design discipline (UML and use-cases) to document the process: 1. Start with requirements – 2.Define use cases - 3. Implement classes/objects (Architecture first, not after-the-fact refactors) 4. Add constraints and invariants (Contracts, boundaries, failure modes, etc.) - 5. Let the agent work inside that frame, pausing at milestones for human oversight.

Those UML/use-case/constraint artifacts aren’t committed as session logs per se, but they are part of the author’s intent and reasoning that gets committed alongside the resulting code. That gives future reviewers the why as well as the what, which is far more useful than a raw AI session transcript.

Stepping back, this feels like a decent and dignified position for a programmer in 2026: humans retain architectural judgement --> AI accelerates boilerplate and edge implementation --> version history still reflects intent and accountability rather than chat transcripts. I can’t afford to let go of the productivity gains that flow from using AI as part of a disciplined engineering process, but I also don’t think commit logs should become a dumping ground for unfiltered conversation history.

YoumuChan 3 days ago

Should my google search history be part of the commit? To that question my answer is no.

travisjungroth 3 days ago

I was looking for an analogy and this is a good one.

The noise to signal ratio seems so bad. You’d have to sift through every little “thought”. If I could record my thought stream would I add it to the commit? Hell no.

Now, a summary of the reasoning, assumptions made and what alternatives were considered? Sure, that makes for a great message.

rerdavies 3 days ago

Heck no. I don't even read the vast majority of the cack that my AI spits out for my own prompts. Why would I inflict that on anyone else?

charcircuit 3 days ago

And not all google searches you do while working on that commit may even be related to that commit. It may be entirely unrelated, or sensitive information that should not be made public.

woctordho 2 days ago

If you archive the session, you automatically archive all Google search history (queries and outputs) that the AI did, and it's usually relevant to the project.

veunes 2 days ago

Perfect analogy. Nobody cares how many times you googled "how to center a div" before finally writing proper CSS. Same goes for agents: I only care about the final architectural state and performance, not how the model brain-farted over trivial boilerplate because of a scuffed system prompt

ovidev 10 hours ago

In the past months, I've been building a SaaS using Claude Code. I haven't written a single line of code.

This is the breakdown of my process - I use tons of .md files serving as a shared brain between Claude and me:

- CLAUDE.md is in the root of the repo, and it's the foundation - it describes the project vision, structure, features, architecture decisions, tech, and others. It then goes even more granular and talks about file sizes, method sizes, problem-solving methodologies (do not reinvent the wheel if a well-known library is already out there), coding practices, constraints, and other aspects like instructions for integration tests. It's basically the manual for the project vision and plan, and also for code writing. Claude reads it every session.

- Every feature has its own .md file, which is maintained. That file describes implementation details, decisions, challenges, and anything that is relevant when starting to code on the feature, and also when it's picked up by a new session.

- At a higher level, above features, I create pairs of roadmap.md and handoff.md. Those pairs are the crucial part of my process. They cover wider modules (e.g., licensing + payments + emailing features) and serve as a bridge between sessions. Roadmap.md is basically a huge checklist, based on CLAUDE.md and features .md docs, and is maintained. The handoff.md contains the current state, session notes, and knowledge. A session would start by getting up to speed with Claude.md and the specific roadmap.md + handoff.md that you plan to work on now and would end by updating the handoff, roadmap, and the impacted features.

This structure greatly helps preserve crucial context and also makes it very easy to use multi-agent.

Of course the commits and PRs are also very descriptive, however the engine is in the .md files.

mandel_x 3 days ago

I’ve been thinking about a simple problem: We’re increasingly merging AI-assisted code into production, but we rarely preserve the thing that actually produced it — the session. Six months later, when debugging or reviewing history, the only artifact left is the diff. So I built git-memento. It attaches AI session transcripts to commits using Git notes.

denismi 3 days ago

> the only artifact left is the diff

You also have code comments, docs in the repo, the commit message, the description and comments on the PR, the description and comments on your Issue tracker.

Providing context for a change is a solved problem, and there is relatively mature MCP for all common tooling.

rerdavies 3 days ago

Not to mention AIs predilection for copious and overly abundant comments.

jwbron 3 days ago

The former GitHub CEO has a startup and this was their first release. They call it checkpoints: https://entire.io/

I copied it for my own tooling to make it work a bit better for my workflows.

onel 2 days ago

Have you been using it? How useful do you find it?

latexr 3 days ago

A better solution would be to read and understand the code before committing it.

mandel_x 3 days ago

People won’t do that, unfortunately. We are a dying breed (I hate it). I went against my own instincts and vibe code this, works as a proof of concept.

You can see the session (including my typos) and compare what was asked for and what you got.

kace91 3 days ago

Your starting point is that people won’t read code, and you expect them to read someone’s llm session from git?

bonoboTP 3 days ago

Another LLM will read it of course.

midnitewarrior 3 days ago

Sounds like we've got an Ape Coder here!

https://rsaksida.com/blog/ape-coding/

dang 3 days ago

Related ongoing thread:

Ape Coding [fiction] - https://news.ycombinator.com/item?id=47206798 - March 2026 (93 comments)

add-sub-mul-div 3 days ago

Personally, I'm not going to be complicit in reshaping the field around the lazy and undisciplined.

hrmtst93837 2 days ago

Reading and understanding the code is essential, but in a collaborative environment, reviewing AI-generated code can be complex. It's about balancing trust in AI with the need for human oversight.

midnitewarrior 3 days ago

I already invented this in my head, thanks for not making me code it.

Excellent idea, I just wish GitHub would show notes. You also risk losing those notes if you rebase the commit they are attached to, so make sure you only attach the notes to a commit on main.

mandel_x 3 days ago

I added an action that will add a comment with the notes in GitHub so that you can see them directly.

I did work around squash to collect all sessions and concatenate them as a single one

midnitewarrior 3 days ago

Well done.

There is so much undefined in how agentic coding is going to mature. Something like what you're doing will need to be a part of it. Hopefully this makes some impressions and pushes things forward.

hrmtst93837 3 days ago

Capturing the AI session is practical for maintaining context during reviews. It will help when tracing back through history after the fact.

ZoomZoomZoom 2 days ago

If by AI you mean the LLM-based tools common now, then I don't want the commits in PRs I'm going to review to bring any more noise than they already do. The human operator is responsible for every line, like they always were.

If by AI you mean non-supervised, autonomous conscience (as I believe the term has to be reserved for), then the answer is again no, as it's as responsible for the quality of its PRs as humans.

If the thing writing code is the former, but there's no human or responsible representative of the latter in the loop, then the code shouldn't be even suggested for consideration in a project where any people do participate. In such case there's no point in storing any additional information as the code itself doesn't have any value (besides electricity wasted to create it) and can be substituted on demand.

Commit comments are generally underused, though, as a result of how forges work, but that's another discussion.

abustamam 3 days ago

I don't think it should be. I think a distilled summary of what the agent did should be committed. This requires some dev discipline. But for example:

Make a button that does X when clicked.

Agent makes the button.

I tell it to make the button red.

Agent makes it red.

I test it, it is missing an edge case. I tell it to fix it.

It fixes it.

I don't like where the button is. I tell it to put it in the sidebar.

It does that.

I can go on and on. But we don't need to know all those intermediaries. We just need to know Red button that does X by Y mechanism is in the sidebar. Tests that include edge cases here. All tests passing. 2026-03-01

And that document is persisted.

If later, the button gets deleted or moved again or something, we can instruct the agent to say why. Button deleted because not used and was noisy. 2026-03-02

This can be made trivial via skills, but I find it a good way to understand a bit more deeply than commit messages would allow me to do.

Of course, we can also just write (or instruct agents to write) better PRs but AFAICT there's no easy way to know that the button came about or was deleted by which PR unless you spelunk in git blame.

raincole 3 days ago

I hope people start doing that. Not that it has any practical usage for the repo itself, but if everyone does that, it'd probably make it much easier for open weight models to catch up the proprietary ones. It'd be like a huge crowdsourced project to collect proprietary models' output for future training.

nz 20 hours ago

If a commit's job is to capture state at a particular point in time, so that it can be reproduced and understood, then it _also_ needs to include the exact model used. This is only useful if you can ensure access to the previous versions of the model -- which is not something that providers are willing to do (in fact, they regularly "retire" old models). The only transparent way forward is to open source the models, along with their weights, and their training set (to verify that the weights match, and to retrain the model when new architectures and new hardware are released).

Not insisting upon this, would be similar to depending on a SaaS to compile and packages software, and being totally cool with it. Both LLMs and build systems, convert human-friendly notation into machine-friendly notation. We should hold the LLM companies to the same standards of transparency that we hold the people who make things like nix, clang, llvm, cmake, cargo, etc.

testbyhuman_tor 15 hours ago

Interesting discussion about tracking the AI session as part of the commit. But there is a missing piece in most of these workflows that bugs me: none of them include the step where a real person who was not involved in building it tries to use the result.

You can document the prompt chain, the plan, the design doc. But if nobody outside the team ever touches it before it ships, you are still flying blind on whether the thing actually works for a human who encounters it cold. The AI session log tells you what was intended. It does not tell you what was understood.

tototrains 3 days ago

I considered this and even built a claude code extension to bring history/chats into the project folder.

Not once have I found it useful: if the intention isn't clear from the code and/or concise docs, the code is bad and needs to be polished.

Well written code written with intention is instantly interpretable with an LLM. Sending the developer or LLM down a rabbit hole of drafts is a waste of cognition and context.

causal 3 days ago

If a car is used to get you somewhere, should you put the exhaust in bags to bring with you?

mannanj 3 days ago

Is session context car exhaust? Or is it the Event logs and code of the CPU/car's brains?

adampunk 2 days ago

It's exhaust. Retrospectively, chat is essentially worthless. You're going to chase hallucinations down conversations that maybe didn't even impact code.

mannanj 2 days ago

I have difficulty believing chat is worthless.

And I think that not everyone will entertain or chase the hallucinations down. Or maybe enough non-hallucinations are chased that it is valuable.

causal 2 days ago

You seem to be thinking like it is 2020 and humans will be the ones reading the chat. It is just context-bloat for whatever agent ends up reading it.

adampunk 2 days ago

The problem is “hallucination” might mean this:

You tell the agent “go do thing A” the agent replies “sure thing buddy, I’ll do that”, noodles, then reports “I’ve done that thing!” MEANWHILE, in reality, the agent has done something totally different—maybe they did a subset, failed completely, made an unrelated change.

Later, you find and FIX the problem but the chat has no record of it because there is *genuinely no point* to telling an agent “you screwed that up,” unless you want that agent to fix it.

Now that session has a completely fictitious story which will seem to correspond with reality only because of out of band action. It’s worse than worthless!

Session chat has only a tenuous and poorly marked match to reality, there is no reason to preserve it.

cyberax 3 days ago

We use flight data recorders on airplanes, though.

brendanmc6 3 days ago

A few things really leveled up both my software quality and my productivity in the last few months. It wasn’t session history, memory files, context management or any of that.

1. Writing a spec with clear acceptance criteria.

2. Assigning IDs to my acceptance criteria. Sounds tedious, but actually the idea wasn’t mine, at some point an agent went and did it without me asking. The references proved so useful for guiding my review that I formalized the process (and switched from .md to .yaml to make it easier).

3. Giving my agents a source of truth to share implementation progress so they can plan their own tasks and more effectively review.

Of course, I can’t help myself, I had to formalize it into a spec standard and a toolkit. Gonna open source it all soon, but I really want feedback before I go too far down the rabbit hole:

https://acai.sh

eru 3 days ago

> Assigning IDs to my acceptance criteria. Sounds tedious, [...]

Might be tedious for a human, but agents should do that just fine?

brendanmc6 3 days ago

Yup and they do, but then I figured out that I can just write loosely structured yaml and the ids come for free. I then encourage the agents to tag and reference them everywhere, especially tests.

xhcuvuvyc 3 days ago

No? For the same reason I don't want to work 8 hours a day with the boss looking over my shoulder.

rDr4g0n 2 days ago

When I began reviewing my teammate’s PRs with AI generated code in it, something started to feel weird. It took a bit, but I realized the problem: I am not reviewing the work my teammate did.

What are they even supposed to do with feedback on the code? It has to be translated by my teammate into the language of the work they did, which is the conversation they had with the AI agent.

But the conversation isn't the "real work": the decisions made in the conversation are the real work. That is what needs capture and review.

So now I know why code reviews are kinda wrong, what can we do to have meaningful reviews of the work my teammates have done?

What I landed on is aiming to capture more and more “work” in the form of a spec, review the spec, ignore the code. this isn't novel or interesting. HOWEVER...

For the large, messy, legacy codebases I work in today, I don’t like the giant spec driven development approach that is most popular today. It’s too risky to solely trust the spec because it touches so much messy code with so many gotchas. However, with the rate of AI generated code rolling in, I simply can’t switch context quickly enough to review it all efficiently. Also, it’s exhausting.

The approach I have been refining is defining very small modules (think a class or meaningful collection of utils) with a spec and a concise set of unit tests, generating code from the spec, then not reading or editing the generated code.

Any changes to the code must be made to the spec, and the code re-generated. This puts the PR conversation in the right place, against the work I have done: which is write the spec.

So far the approach has worked for replacing simple code (eg: a nestjs service that has a handful of public methods, a bit of business logic, and a few API client calls). PRs usually have a handful of lines of glue code to review, but the rest are specs (and a selection of “trust” unit tests) and the idea is that the code can be skipped.

AI review bots still review the PR and comment around code quality and potential security concerns, which I then translate into updates to the spec.

I find this to be a good step towards the codegen future without totally handing over my (very messy and not very agent friendly) codebases.

rurban 17 hours ago

No, the (github issue) ID is enough.

For my AI coding sessions I just point opencode to the issue. It does a plan, (build, ie) implements and tests the plan, and commits it. For reference you always have the issue, revise the issue when something changed.

We always worked like this, recording the thinking and planning part is silly. You can always save your session data.

gck1 22 hours ago

If you want an answer to the OP question, just ask AI to analyze the session jsonl files in your user directory and give you statistics of what's in there.

You'll find that at least half of it is noise.

If you put that in commits, you lose the ability to add "study git commits to ground yourself" in your agents.md or prompts. Because now you'll have 50%+ noise in your active session's context window.

Context window is precious. Guard it however you can.

lionkor 3 days ago

Sone of the best engineers I've seen use commit messages to explain their intent, sometimes even in many sentences, below the message.

I bet, without trying to be snarky, that most AI users don't even know you can commit with an editor instead of -m "message" and write more detail.

It's good that AI fans are finding out that commits are important, now don't reinvent the wheel and just spend a couple minutes writing each commit message. You'll thank yourself later.

handfuloflight 2 days ago

This gave me a good chuckle. Anyone doing "good" engineering with coding agents are having them write detailed, high signal, low noise git commits... it's table stakes.

kzahel 3 days ago

I would love to be able to share all my sessions automatically. But I would want to share a carefully PII/secrets redacted session. I added a "session sharing" feature to my agent wrapper that just grabs innerHTML and uploads to cloudflare. So I can share how I produced/vibe coded an entire project from start to finish.

For example: https://github.com/kzahel/PearSync/blob/main/sessions/sessio...

I think it's valuable to share that so people who are interested can see how you interact with agents. Sharing raw JSONL is probably a waste and contains too many absolute paths and potential for sharing unintentionally.

https://github.com/peteromallet/dataclaw?tab=readme-ov-file#... is one project I saw that makes an attempt to remove PII/secrets. But I certainly wouldn't share all my sessions right now, I just don't know what secrets accidentally got in them.

hakanderyal 3 days ago

I created a system which I call 'devlog'. Agent summarizes what it did & how it did in a concise file, and its gets committed along with first prompt and the plan file if any. Later due to noise & volume, I started saving those in a database and adding only devlog id to commit nowadays.

Now whenever I need to reason with what agent did & why, info is linked & ready on demand. If needed, session is also saved.

It helps a lot.

reflectt 3 days ago

The session capture problem is harder than it looks because you need to capture intent, not steps.

A coding session has a lot of 'left turn, dead end, backtrack' noise that buries the decision that actually mattered. Committing the full session is like committing compiler output — technically complete, practically unreadable.

We've been experimenting with structured post-task reflections instead: after completing significant work, capture what you tried, what failed, what you'd do differently, and the actual decision reasoning. A few hundred tokens instead of tens of thousands. Commits with a reflection pointer rather than an embedded session.

The result is more useful than raw logs. Future engineers (or future AI sessions) can understand intent without replaying the whole conversation. It's closer to how good commit messages work — not 'here's what changed' but 'here's why'.

Dang's point about there being no single session is also real. Our biggest tasks span multiple sessions and multiple contributors. 'Capture the session' doesn't compose. 'Capture the decision' does.

D-Machine 3 days ago

Something like "it is important to document core / crucial prompts somewhere" covers it. Whether this should be in git or elsewhere is trickier, but doing vibe-coding without documenting any aspect of the process is a recipe for disaster.

Also, how can we (or future AI models) hope to improve if there is only limited and summary documentation of AI usage?

vtemian 3 days ago

Git was designed for humans.

Commits, branches, and the entire model works really well for human-to-human collaboration, but it starts to be too much for agent-to-human interactions.

Sharing the entire session, in a human, readble way, offering a rich experiences to other humans to understand, is way better then having git annotations.

That's why we built https://github.com/wunderlabs-dev/claudebin.com. A free and open-source Claude Code session sharing tool, which allows other humans to better understand decisions.

Those sessions can be shared in PR https://github.com/vtemian/blog.vtemian.com/pull/21, embedded https://blog.vtemian.com/post/vibe-infer/ or just shared with other humans.

CloakHQ 2 days ago

The plan.md approach solves something I've been struggling with on a browser automation project. When you're building something stateful (browser sessions, fingerprinting logic etc.) the "why" behind decisions gets lost fast. Not just for other devs, but for the AI itself in future sessions.

One thing I've added on top of the plan/project structure: a short `decisions.md` that logs only the non-obvious choices, like "tried X, it caused Y issue, went with Z instead". Basically the things that would make future-me or a future agent waste time rediscovering.

Do you find the plan.md files stay useful past the initial build, or do they mostly just serve as a commit artifact?

claud_ia 2 days ago

The raw session noise — repeated clarifications, trial-and-error prompting, hallucinated APIs — probably isn't worth preserving. But AI sessions contain one category of signal that almost never makes it into code or commit messages: the counterfactual space — what approaches were tried and rejected, which constraints emerged mid-session, why the chosen implementation looks the way it does.

That's what architectural decision records (ADRs) are designed to capture, and it's where the workflow naturally lands. Not committing the full transcript, but having the agent synthesize a brief ADR at the close of each session: here's what was attempted, what was discarded and why, what the resulting code assumes. Future maintainers — human or AI — need exactly that, and it's compact enough that git handles it fine.

angry_octet 2 days ago

Since the code is literally the answer to What comes next after this prompt the answer is yes. Unfortunately there is also a hidden random seed in the engine (which this doesn't seem to address). But if you capture the seed, the exact version of the software and the prompt, the system is completely deterministic.

However there is an unpleasant reality: the system could be incredibly brittle, with the slightest change in input or seed resulting in significantly different output. It would be nice if all small and seemingly inconsequential input perturbations resulted in a cluster of outputs that are more or less the same, but that seems very model dependent.

Lerc 3 days ago

I would say not, because it would lead some to think that what was said to the model represented what output was desired. While there is quite a bit of correlation with describing what you want with the output you receive, the nature of models as they stand mean you are not asking for what you want, you are crafting the text that elicits the response that you want. That distinction is important, and is model specific. Without keeping an archive of the entire model used to generate the output, the conversation can be very misleading.

Conversations may also be very non-linear. You can take a path attempting something, roll back to a fork in the conversation and take a different path using what you have learned from the models output. I think trying to interpret someone else's branching flow would be more likely to create an inaccurate impression than understanding.

D-Machine 3 days ago

An important consideration somewhat missing in discussion in this thread: if we don't carefully document AI-assisted coding sessions, how can we ever hope to improve our use of AI coding tools?

This applies both to future AI tools and also experts, and experts instructing novices.

To some degree, the lack of documenting AI sessions is also at the core of much of the skepticism toward the value of AI coding in general: there are so many claims of successes / failures, but only a vanishingly small amount of actual detailed receipts.

Automating the documentation of some aspects of the sessions (skills + prompts, at least) is something both AI skeptics and proponents ought to be able to agree on.

EDIT: Heck, if you also automate documenting the time spent prompting and waiting for answers and/or code-gen, this would also go a long way to providing really concrete evidence for / against the various claims of productivity gains.

ramoz 3 days ago

We think so as well with emphasis on "why" for commits (i.e. intent provenance of all decisions).

https://github.com/eqtylab/y just a prototype, built at codex hackathon

The barrier for entry is just including the complete sessions. It gets a little nuanced because of the sheer size and workflows around squash merging and what not, and deciding where you actually want to store the sessions. For instance, get notes is intuitive; however, there are complexities around it. Less elegant approach is just to take all sessions in separate branches.

Beyond this, you could have agents summarize an intuitive data structure as to why certain commits exist and how the code arrived there. I think this would be a general utility for human and AI code reviewers alike. That is what we built. Cost /utility need to make sense. Research needs to determine if this is all actually better than proper comments in code

handfuloflight 2 days ago

Why is this so complicated? Store a session id that points to the full conversation artifacts (off repo) with the git commit and look it up ad hoc as needed. Why do the conversations need to be in the git repos?

veunes 2 days ago

The idea of "saving prompts for reproducibility" is dead on arrival. LLMs are non-deterministic by nature. In a year, they'll deprecate this model's API, and the new version will spit out completely different code with entirely new bugs for the exact same prompt. A prompt isn't source code, it's just a temporary crutch for stochastic generation. And if I have to read 50 pages of schizophrenic dialogue with an LLM just to understand why a specific function exists, that PR gets an instant reject. The artifact is and always will be readable code plus a sane commit message. Dumping a log of hallucinations will only make debugging a nightmare when this Frankenstein inevitably falls apart in prod tbh

jwrallie 2 days ago

This is something that should be possible in principle, since the machines underneath are deterministic, it’s just a limitation of the implementation.

daemonk 3 days ago

I did this in the beginning and realized I never went back to it. I think we have to learn to embrace the chaos. We can try to place a couple of anchors in the search space by having Claude summarize the code base every once in a while, but I am not sure if even that is necessary. The code it writes is git versioned and is probably enough to go on.

esafak 3 days ago

It is a great way to document your thinking. I think we should get in the habit of checking conversations in with git notes or something.

xhcuvuvyc 3 days ago

Just get it to write more comments about reasoning as you go.

mock-possum 3 days ago

I’ve had the same thought, but after playing around with it, it just seems like adding noise. I never find myself looking at generated code and wondering “what prompt lead to that?” There’s no point, I won’t get any kind of useful response - I’m better off talking to the developer who committed it, that’s how code review works.

what 3 days ago

The developer that committed won’t know because they didn’t write it…

You can avoid the noise with git notes. Add the session as a note on the commit. No one has to read them if they’re not interested.

alainrk 3 days ago

My complete reasoning, notes, errors have never been part of the commit. I don't see a valid reason on why the raw conversation must be included. Rather I have hooks (or just "manually" invoked) to process all of it and update the relevant documentation that I've been putting under docs/.

D-Machine 3 days ago

If you also ensure the AI writes relevant (and correct) docs, and also code comments and commit message, then I agree there is not much need for extra info, e.g. prompts / session distillation. I am not sure that that is the case currently (though we might be getting there soon at least in some cases).

alainrk 3 days ago

Agree with this, I've been testing how AGENTS.md and similar can do to automatically have these behaviours and I feel (it's just feeling) it's been improving over time. Clearly depends a lot on the agent, the model, the codebase size and so on.

D-Machine 3 days ago

Yup. Realistically there will always be simple changes that AI can handle completely (docs, comments, and commit message), and other changes where some human input will be hugely valuable.

Until then, it makes sense to automatically include some distillation of the AI generation process, by default, IMO.

brainlounge 3 days ago

The more fundamental question is: Is there information in the AI-coding session that should be preserved? Only if the answer is "yes", the next question becomes: Where do we store that data?

git is only one possible location.

I think there is very valuable information in session logs, like the prompts, or the usage statistics at the end of the session, which model was used etc. But git history or the commit messages should focus on the outcome of the work, not on the process itself. This is why the whole issue discussion before work in git starts is also typically kept separately in tickets. Not in git itself, but close to it.

There're platforms like tulpal.com which move the whole local agent-supported process to the server and therefore have much better after-the-fact observability in what happened.

natex84 3 days ago

If the model in use is managed by a 3rd party, can be updated at will, and also gives different output each time it is interacted with, what is the main benefit?

If I chat with an agent and give an initial prompt, and it gets "aspect A" (some arbitrary aspect of the expected code) wrong, I'll iterate to get "aspect A" corrected. Other aspects of the output may have exactly matched my (potentially unstated) expectation.

If I feed the initial prompt into the agent at some later date, should I expect exactly "aspect A" to be incorrect again? It seems more likely the result will be different, maybe with some other aspects being "unexpected". Maybe these new problems weren't even discussed in the initial archived chat log, since at that time they happened to be generated in a way in alignment with the original engineers expectation.

fragmede 3 days ago

Because intent matters and 6 months or 3 years down the line and it's time to refactor, and the original human author is long gone, there's a difference if the prompt was "I need a login screen" vs "I need a login screen, it should support magic link login and nothing else".

superturkey650 3 days ago

Isn’t that point of design docs and not the commit log?

bear3r 3 days ago

reproducibility isn't really the goal imo. more like a decision audit trail -- same reason code comments have value even though you can't regenerate the code from them. six months later when you're debugging you want to know 'why did we choose this approach' not 'replay the exact conversation.'

sin5d 24 hours ago

[dead]

tokiory 2 days ago

Hell no, there are many companies, which doesn't use any AI (or just using copilot). I would hate to read a commit history where every commit had a "conversation" attached to it. Code should be human-first, always

daxfohl 3 days ago

I think so. If nothing else, when you deploy and see a bug, you can have a script that revives the LLMs of the last N commits and ask "would your change have caused this?" Probably wouldn't work or be any more efficient than a new debugging agent most of the time, but it might sometimes and you'd have a fix PR ready before you even answered the pager, and a postmortem that includes WHY it did so, and a prompt to prevent that behavior in the future. And it's cheap, so why not.

Maybe not a permanent part of the commit, but something stored on the side for a few weeks at a time. Or even permanently, it could be useful to go back and ask, "why did you do it that way?", and realize that the reason is no longer relevant and you can simplify the design without worrying you're breaking something.

micw 3 days ago

IMO it depends a bit, but in most cases: No!

If you do proper software development (planing, spec, task breakdown, test case spec, implementation, unit test, acceptance test, ...) implementation is just a single step and the generated artifact is the source code. And that's what needs to be checked in. All the other artifacts are usually stored elsewhere.

If you do spec and planing with AI, you should also commit the outcome and maybe also the prompt and session (like a meeting note on a spec meeting). But it's a different artifact then.

But if you skip all the steps and put your idea directly to an coding agent in the hope that the result is a final, tested and production ready software, you should absolutely commit the whole chat session (or at least make the AI create a summary of it).

ffsm8 3 days ago

LLMs frequently hallucinate and go off on wild goose chases. It's admittedly gotten a lot better, but it still happens.

From that perspective alone the session would be important meta information that could be used to determine the rationale of a commit - right from the intent (prompt) to what the harness (Claude code etc) made of it. So there is more value in keeping it even in your second scenario

micw 3 days ago

I try to use AI incremental and verify each result. If it goes mad, I just revert and start over. It's a bit slower but ensures consistency and correctness and it's still a huge improvement over doing everything manually.

jon_north 2 days ago

This seems like a very good idea, not just because of the desire to do human archaeology at times, but also to let further agentic exploration occur. It would be best if it became a separate section of the commit that could just be blank or contain other documentation in the case of human authorship. The commit message shouldn't get longer and longer. It should continue to tell the concise story that humans and LLMs alike consume quickly to gain some initial synthesis.

So I like the link's approach quite a bit.

JustFinishedBSG 2 days ago

I understand the idea but the way I work, a commit isn't "a" session, it's potentially tens of sessions with branching in each session.

I honestly don't know if I'm doing something very wrong or if I have a very different working style than many people, but for me "just give the prompt/session" isn't a possibility because there isn't one.

I'm probably incredibly inefficient, because even when I don't use AI it is the same, a single commit is usually many different working states / ideas / branches of things I tried and explored that have been amended / squashed.

otar 3 days ago

In the ideal world a specification file should be committed to the repository and then linked to the PR/commit. But it slows you down and is no longer a vibe coding?

Soon only implementation details will matter. Code can be generated based on those specifications again and again.

brendanmc6 3 days ago

I agree, and am so captivated with the idea that I decided to build a whole toolkit around it. Would be very keen to get feedback if anyone wants to try it when it’s ready.

https://acai.sh

So far this workflow is the only way I’ve been able to have any real success running parallel agents or assigning longer running tasks that don’t get thrown out.

jumploops 3 days ago

I've been experimenting with a few ways to keep the "historical context" of the codebase relevant to future agent sessions.

First, I tried using simple inline comments, but the agents happily (and silently) removed them, even when prompted not to.

The next attempt was to have a parallel markdown file for every code file. This worked OK, but suffered from a few issues:

1. Understanding context beyond the current session

2. Tracking related files/invocations

3. Cold start problem on an existing codebases

To solve 1 and 3, I built a simple "doc agent" that does a poor man's tree traversal of the codebase, noting any unknowns/TODOs, and running until "done."

To solve 2, I explored using the AST directly, but this made the human aspect of the codebase even less pronounced (not to mention a variety of complex edge-cases), and I found the "doc agent" approach good enough for outlining related files/uses.

To improve the "doc agent" cold start flow, I also added a folder level spec/markdown file, which in retrospect seems obvious.

The main benefit of this system, is that when the agent is working, it not only has to change the source code, but it has to reckon with the explanation/rationale behind said source code. I haven't done any rigorous testing, but in my anecdotal experience, the models make fewer mistakes and cause less regressions overall.

I'm currently toying around with a more formal way to mark something as a human decision vs. an agent decision (i.e. this is very important vs. this was just the path of least resistance), however the current approach seems to work well enough.

If anyone is curious what this looks like, I ran the cold start on OpenAI's Codex repo[0].

[0]https://github.com/jumploops/codex/blob/file-specs/codex-rs/...

crossroadsguy 3 days ago

Goodness no! Sometimes I literally SHOUT at these agents/chats and often stoop down to using cuss words, which I am not proud of, but surprisingly it has shown to work here and there. As real as that is, I'd not want that on record in a commit.

gingersnap 3 days ago

My instinct is to say that I don't want the session as part of the commit. For me that is like a Slack thread discussing the new feature, and that is not something I would commit. I think that the split shouldn't be "is this done with a machine"=> commit, I think the split for AI should be the same as before. Is it code or changes of code, then it should be included. Is it discussing, going back and forth, that is not commited now. On the other hand, if you do a plan that is then implemented, I actually do think it makes sense to save the plan, either as commit, or if you save that back to the issue.

burntoutgray 3 days ago

YES! The session becomes the source code.

Back in the dark ages, you'd "cc -s hello.c" to check the assembler source. With time we stopped doing that and hello.c became the originating artefact. On the same basis the session becomes the originating artefact.

lich_king 3 days ago

I'm not sure this analogy holds, for two reasons. First, even in the best case, chain-of-thought transcripts don't reliably tell you what the agent is doing and why it's doing it. Second, if you're dealing with a malicious actor, the transcript may have no relation to the code they're submitting.

The reason you don't have to look at assembly is that the .c file is essentially a 100% reliable and unambiguous spec of how the assembly will look like, and you will be generating the assembly from that .c file as a part of the build process anyway. I don't see how this works here. It adds a lengthy artifact without lessening the need for a code review. It may be useful for investigations in enterprise settings, but in the OSS ecosystem?...

Also, people using AI coding tools to submit patches to open-source projects are weirdly hesitant to disclose that.

WD-42 3 days ago

This is only true if a llm session would produce a deterministic output which is not the case. This whole “LLMs are the new compiler” argument doesn’t hold water.

bonoboTP 3 days ago

"Deterministic" is not the issue either, it's that small changes of the input will cause unknown changes in the output. You might theoretically achieve determinism and reproducibility for the exact same input (seeding the random number generators etc.), but the issue is that even if you formulate your request just a little differently, by changing punctuation for example, you'll get an entirely different output.

With compilers, the rules are clear, e.g. if you replace variable names with different ones, the program will still do the same thing. If you add spaces in places where whitespace doesn't matter, like around operators, the resulting behavior will still be the same. You change one function's definition, it doesn't impact another function's definition. (I'm sure you can nitpick this with some edge case, but that's not the point, it overwhelmingly can be relied upon in this way in day to day work.)

sumeno 3 days ago

cc was deterministic, you could be confident that the same code produced the same assembly each time you ran it

That is very much not the case with LLMs

ehnto 3 days ago

LLMs are non-deterministic, you would end up with a different output even if you paste the same conversation in. Even if the model was identical at the time you tried to reproduce it. Which gets less likely as time passes.

Also, why would you need to reproduce it? You have the code. Almost any modification to said code would benefit from a fresh context and refined prompt.

An actual full context of a thinking agent is asinine, full of busy work, at best if you want to preserve the "reason" for the commits contents maybe you could summarise the context.

Other than that I see no reason to store the whole context per commit.

root_axis 3 days ago

This seems wrong, like committing debug logs to the repo. There's also lots of research showing that models regularly produce incorrect trace tokens even with a correct solution, so there's questionable value even from a debugging perspective.

resters 2 days ago

What would be most useful is some kind of context representation that could be upgraded as better models get developed. If you put it in the commit then you need to compare contexts when comparing code across time. But if you make the context include the changes in the code over time, then the future context will be better at debugging a bug in code written years earlier. The years-old context is likely going to be obsolete by that time anyway.

jiveturkey 3 days ago

https://entire.io thinks so

reg_dunlop 3 days ago

Thank you! Was looking for this company. Founder was high up at GitHub. Really an interesting proposition

nomilk 2 days ago

The way I've been storing prompts is a directory in the project called 'prompts' and an .md file for each topic/feature. Since I usually iterate a lot on the same prompt (to minimise context rot), I store many versions of the same prompt ordered chronologically (newest at top).

That way if I need to find a prompt from some feature from the past, I just find the relevant .md file and it's right at the top.

Interestingly, my projects are way better documented (via prompts) than they ever were in the pre-agentic era.

ryan_velazquez 2 days ago

If the agent is like a compiler, show me the source code.

I'm not sure about becoming part of the repo/project long term but I think providing your prompts as part of the pull request makes the review much easier because the reviewer can quickly understand your _intent_. If your intent has faulty assumptions or if the review disagrees with the intent, that should be addressed first. If the intent looks good, a reviewer can then determine if you (or your coding agent) have actually implemented it first.

fladrif 3 days ago

I think this is a lot of "kicking can down the road" of not understanding what code the ai is writing. Once you give up understanding the code that is written there is no going back. You can add all the helper commit messages, architecture designs, plans, but then you introduce the problem of having to read all of those once you run into an issue. We've left readability on the wayside to the alter of "writeability".

The paradigm shift, which is a shift back, is to embrace the fact that you have to slow down, and understand all the code the ai is writing.

anthonyrstevens 13 hours ago

Did you review all the code-gen code that might have been created in your projects pre-AI? Scaffolding, boilerplate, extended autocomplete, etc?

kaycey2022 2 days ago

This feels woefully inadequate. It should be saving everything. Not just the prompts and replies, but also the tool calls and skill invocations. If that is too much, then why even save anything in the session?

Right now this paradigm is so novel to us that we don’t know if what is being saved is useful in anyway or just hoarding garbage.

There are some who (rightly IMO) just neatly squash their commits and destroy the working branch after merging. There are others who would rather preserve everything.

alansaber 2 days ago

Given that LLM providers capture this information anyway, seems only fair to let the consumer do the same

visarga 3 days ago

Yes, it should remain part of the commit, and the work plan too, including judgements/reviews done with other agents. The chat log encodes user intent in raw form, which justifies tasks which in turn justify the code and its tests. Bottom up we say the tests satisfy the code, which satisfies the plan and finally the user intent. You can do the "satisfied/justified" game across the stack.

I only log my own user messages not AI responses in a chat_log.md file, which is created by user message hook in the repo.

bloomca 3 days ago

I don't think it's worth to include the session -- it would bloat the context too much anyway.

However, I do think that a higher-level description of every notable feature should be documented, along with the general implementation details. I use this approach for my side projects and it works fairly well.

The biggest question whether it will scale, I suspect that no, and I also suspect it is probably better to include nothing than a poor/disjointed/rare documentation of the sessions.

willbeddow 3 days ago

Increasingly, I'd like the code to live alongside a journal and research log. My workflow right now is spending most of my time in Obsidian writing design docs for features, and then manually managing claude sessions that I paste them back and forth into. I have a page in obsidian for each ongoing session, and I record my prompts, forked paths, thoughts on future directions, etc. It seems natural that at some point this (code, journal, LLM context) will all be unified.

gavinray 2 days ago

This is what the Github CEO recently announced as a product/company:

https://entire.io/

Original blogpost goes over motivations + workflow:

https://entire.io/blog/hello-entire-world/

rhgraysonii 3 days ago

I think the decisions it made along the way are worth tracking. And it’s got some useful side effects with regard to actually going through the programming and architecture process. I made a tool that really helps with this and finds a pretty portable middle ground that can be used by one person or a team too, it’s flexible. https://deciduous.dev/

dogas 2 days ago

I created a tool that will automatically suck in claude sessions into a separate repo. It sanitizes any sensitive data like API keys. Our team finds this useful to share sessions + context.

https://github.com/gammons/ai-session

saratogacx 3 days ago

I've gotten into the habit of having the LLM produce a description of their process and summarize the change, Than I add that along with the model I used after my own commit message. It lets me know where I use AI and what I thought it did as well as what I thought it did.

The entire prompt and process would be fine if my git history was subject to research but really it is a tool for me or anyone else who wants to know what happened at a given time.

131hn 2 days ago

Vibecoded code is not C, python, ts, je or whatever.

It need to be considered as a compiled output of vbc-c, vbc-python, or vbc-ts, or vbc-js.

Keeping the source code (the prompt) is very natural, when compiled binaries “vibecoded” output is lacking _context_ and _motivation_ (which the source code / prompt provides)

dboreham 3 days ago

I've thought about this, and I do save the sessions for educational purposes. But what I ended up doing is exactly what I ask developers to do: update the bug report with the analysis, plan, notes etc. In the case there's a single PR fixing one bug, GitHub and Claude tend to prefer this information go in the PR description. That's ok for me since it's one click from the bug.

dboreham 2 days ago

Should also have noted that I put the "prompt" in the bug report. So my top level Claude prompt becomes "Please take a look at issue #xxx".

Garlef 2 days ago

I think this is the wrong mental model.

Instead, we need better (self-explaining) translation from spec to code. And better tools that help us navigate codebases we've not written ourselves.

For example, imagine a UI where you click on a feature spec file and it highlights you all the relevant tests and code.

globular-toast 3 days ago

Like any discussion about AI there are two things people are talking about here and it's not always clear which:

1. Using LLMs as a tool but still very much crafting the software "by hand",

2. Just prompting LLMs, not reading or understanding the source code and just running the software to verify the output.

A lot of comments here seem to be thinking of 1. But I'm pretty sure the OP is thinking of 2.

ottah 2 days ago

How could this possibly be of any value. Commit history is not a grab bag of every random thing that happened during the development process. It's a series of checkpoints that lets you back out of bad decisions.

kkarpkkarp 3 days ago

For my own projects in private repos I would benefit from exporting the session. For example if I need to return to the task, it could be great to give it as a context

For my work as one of developers in team, no. The way I prompt is my asset and advantage over others in a team who always complain about AI not being able to provide correct solutions and secures my career

PeterStuer 2 days ago

The session might contain many artifacts that are not suited for open sourcing. The additional fine grained curation effort required might be more of an obstacle to open sourcing than the perceived benefits.

That said preserved private session records might be of great personal benefit.

phyzix5761 3 days ago

Have AI explain the reasoning behind the PR. I don't think people really care about your step by step process but reviewers might care about your approach, design choices, caveats, and trade offs.

That context could clarify the problem, why the solution was chosen, key assumptions, potential risks, and future work.

zkmon 2 days ago

Source code repositories such as git are for "sources" which are direct outputs of human effort. Sny generated stuff is not "source". It is same as the outputs of compile and build activities. Only the direct outputs of human effort should go into git.

jillesvangurp 2 days ago

I think that's covered by the YAGNI rule. It has very little value that rapidly drops off as you commit more code. Maybe some types of software you might want to store some stuff for compliance/auditing reasons. But beyond that, I don't see what you would use it for.

eddyg 2 days ago

https://specstory.com/specstory-cli is another tool in this space (it writes clean Markdown session files into the project for future reference)

rcy 3 days ago

I haven't adopted this yet, but have a feeling that something like this is the right level of recording the llm contribution / session https://blog.bryanl.dev/posts/change-intent-records

4b11b4 3 days ago

I like it, but it seemed test could capture some of these "behaviors". But having it in a single document is helpful for context

travisgriggs 3 days ago

In our (small) team, we’ve taken to documenting/disclosing what part(s) of the process an LLM tool played in the proposed changes. We’ve all agreed that we like this better, both as submitters and reviewers. And though we’ve discussed why, none of us has coined exactly WHY we like this model better.

voidUpdate 2 days ago

People keep talking about how LLMs are like a compiler from human language to code. We commit source code instead of just compiled machine code, so why should this be any different? The "source code" is the prompts

criley2 2 days ago

The prompt isn't very useful. You'd see the exact same prompt on every ticket for me.

Prompt 1: "Research <X> domain, think deeply, and record a full analysis in /docs/TICKET-123-NOTES.md"

Prompt 2: Based on our research, read TICKET-123 and began formulating solutions. Let's think this problem through and come up with multiple potential solutions. Document our solutions in TICKET-123-SOLUTIONS.md

Prompt 3: Based on Solution X, let's formulate a complete plan to implement. Break the work into medium sized tasks that a human could complete in 5-10 hours. Write our plan in TICKET-123-PLAN.md

I've often thought that some of this metadata, such as the research, solutioning and plan could be shared. I think they're valuable for code review. I've also translated these artifacts into other developer documentation paradigms.

But the prompts? You're not getting a lot of value there.

lelanthran 2 days ago

> Prompt 1: "Research <X> domain, think deeply, and record a full analysis in /docs/TICKET-123-NOTES.md"

> Prompt 2: Based on our research, read TICKET-123 and began formulating solutions. Let's think this problem through and come up with multiple potential solutions. Document our solutions in TICKET-123-SOLUTIONS.md

> Prompt 3: Based on Solution X, let's formulate a complete plan to implement. Break the work into medium sized tasks that a human could complete in 5-10 hours. Write our plan in TICKET-123-PLAN.md

Sounds to me that all these 10x - 100x "engineers" can be removed from the loop.

criley2 2 days ago

Almost! We are certainly on the precipice of the vast majority of white collar work being removed from the loop.

However, what each domain will tell you (engineering included) is that AI doesn't understand the full context of what you're doing and the point of the business and where to spend effort and where to cut corners. There is definitely still room for competent engineers to iterate here on the solutioning and plans to refine the AI work into something more sturdy.

Although this is only in domains where code quality truly matters. A lot of consumer software without SLA's are just vibe coding full speed now. No code review, AI writing 100% of the code.

voidUpdate 2 days ago

Judging by what I've seen recently, 100% LLM code is often buggy and not that great. I'd say code quality truly matters in all domains

criley2 2 days ago

What a utopia, where code quality matters in all domains!

In my opinion nearly the opposite is true: modern business solves for the "minimum viable quality". What is the absolute lowest quality the software can be and not tank the business.

ajam1507 2 days ago

If you could prove what "minimum viable quality" actually was this would be true. We have standards and procedures exactly because it is unknowable. One engineers idea of "good enough" might bankrupt the business.

lelanthran 2 days ago

> What a utopia, where code quality matters in all domains!

It does. The degree may not, though.

"We have a threshold of at least 5 hours total uptime every 24 hours" is still a quality bar, even if it is different to "We have a threshold of 99.99% uptime per year".

voidUpdate 2 days ago

Maybe you're different, but I prefer to write code that at least attempts to be performant, tidy and readable, as well as working at least 90% of the time. Maybe I don't achieve perfection, but I try to care about the quality of what I write

segmondy 3 days ago

It's already bad enough that people are saying there's too much code to read and review. You want to add session to it? Running it again, might not yield the same output. These models are non deterministic and models are often changed and upgraded.

D-Machine 2 days ago

Part of the reason there is too much code to read and review is because we lack the information to contextualize that code.

In many cases, seeing the prompts would help to dramatically speed up rejecting lazy slop PRs (or accepting more careful AI-assisted PRs).

pzygadlo 2 days ago

I decided to extract the session. I know it is risky (you might lose some info), but the trade-off is readability (and so data access).

jollymonATX 2 days ago

How verbose a history is even plausible to store and recall in modern git? This could add decent pressure on those mechanisms and the usability, for humans at least, would be taxing to consume.

tayo42 3 days ago

I feel like publishing the session is like publishing a sketch book. I don't need all of my mistakes and dumb questions recorded.

If that was important, why are we not already doing things like this. Should I have always been putting my browser history in commits?

semiinfinitely 2 days ago

Should your browser and search history be part of the commit too?

angry_octet 2 days ago

The thought you have while coding should be part of your workbook. This is a distillation of all the input and processing at the time, which can be a valuable clue for bug hunting and refactoring.

semiinfinitely 2 days ago

yeah lets commit all your thoughts too, and your personal journal. please also include a picture of your children and wife in the commit

angry_octet 2 days ago

This is an extremely ugly response and you should feel shame.

genghisjahn 3 days ago

If you can, run several agents. They document their process. Trade offs considered, reasoning. Etc. it’s not a full log of the session but a reasonable history of how the code came to be. Commit it with the code. Namespace it however you want.

galaxyLogic 2 days ago

Couldn't AI write the commit-message based on the prompts-history up till the commit thus making it easier to understand for any future reviewers what lead to and what is in a specific commit?

akoskomuves 2 days ago

I've done something similar with full analytics and options to add the full team. https://getpromptly.xyz

heavyset_go 3 days ago

If you need LLM sessions included to understand or explain commits, you're doing something wrong.

Saving sessions is even more pointless without the full context the LLM uses that is hidden from the user. That's too noisy.

ajam1507 2 days ago

Yes, please, it would solve the problem of the relentless HN discussions about how useful AI is for coding. We could actually see how productive people are using it.

Marlinski 2 days ago

If there was a standardized way to save this information, and tie it up to each commits, it would be insanely useful to amass a very valuable training dataset.

jtesp 2 days ago

according to entire.io it should. i have been keeping a local log for a while and have now been trying out entire. still not sure how i feel about it

pros:

intent is documented

reference to see how it was made

informal documentation

find flaws in your mental model

others can learn from your style

cons:

others can see how it was made

mention things you don't want others to see/know

people can see how dumb we are

reality:

you will judge and be judged for engineering competency not through code, but through words

robseed 2 days ago

Unedited AI generated code should have a different blame line than regular code, something like author_ai vs author.

wiseowise 2 days ago

No, because if AI is set to replace a human – their prompting skill and approach are the only things differentiating them from the rest of the grey mass.

tezza 3 days ago

I put a link to the LLM session at the end of the commit, and prefix with POH: if I wrote it by hand.

POH = Plain Old Human

Easy to achieve.

Why NOT include a link back? Why deprive yourself of information?

danhergir 3 days ago

One of the use cases i see for this tool is helping companies to understand the output coming from the llm blackbox and the process which the employee took to complete a certain task

tartoran 3 days ago

Why do that? Just let them deal with it.

rerdavies 3 days ago

Except it doesn't capture the majority of uses of AI, in my experience. In my current practice, the the vast majority of AI use is autocompletions, or small inline prompts. ("Fix this error."; "Open an ALSA midi connection" (things that avoid a to trip into awful documentation); "if (one of the query parameters is "gear='ir') ..." (things that break flow by forcing a trip into excellent but overly verbose Javascript URL API documentation)). Only very occasionally will I prompt for a big chunk of code.

danhergir 2 days ago

What coding agent are you using at your workplace?

x3n0ph3n3 3 days ago

I include my "plans" and a link to my transcript on all my PRs that include AI-generated code. If nothing else, others on my team can learn from them.

ares623 3 days ago

Maybe Git isn't the right tool to track the sessions. Some kind of new Semi-Human Intelligence Tracking tool. It will need a clever and shorter name though.

jiveturkey 3 days ago

How about Entire?

https://techcrunch.com/2026/02/10/former-github-ceo-raises-r...

https://news.ycombinator.com/item?id=46961345

MengerSponge 3 days ago

I like to have a cup of coffee before my morning commit.

Germans are much more diligent about staging before they commit.

nomel 3 days ago

I don't think git is the right tool for much of modern software, where things like blobs aren't even properly supported.

handfuloflight 2 days ago

Is there anything better?

DonThomasitos 3 days ago

Everything in git can and must be merge-able when merging branches. After all, git is a collaboration tool, not a undo-redo stack.

atmosx 2 days ago

It is a useful piece of information, but the session is not “long lived” in terms of git commit history lifetime.

ChicagoDave 2 days ago

The last 5 sessions. Beyond that I archive them outside the repo. But I do save them for review and summaries.

darepublic 3 days ago

If a human writes code, should the jira ticket be part of the commit? I am actually thinking about potential merits.

dolebirchwood 2 days ago

I drop a lot of F-bombs and other unpleasantries when I talk to the robots, so I'd rather not.

mixdup 2 days ago

LLMs are non-deterministic, so feeding that session back in possibly will get you a different output. Also, models change over time so you may not necessarily be able to run the session against the same model again

The whole point of the source code it generates is to have the artifact. Maybe this is somewhat useful if you need to train people how to use AI, but at the end of the day the generated code is the thing that matters. If you keep other notes/documentation from meetings and design sessions, however you keep that is probably where this should go, too?

angry_octet 2 days ago

They are completely deterministic? We introduce pseudo-randomness to assist with exploring the solution space.

The fact that models change is a great reason to be able to re-run a previous model and maintain revision control and repeatability.

The source code artifact is not really the point. Not anymore.

rclabs 3 days ago

hell to the no, in between coding sessions, I go out on plenty of sidebars about random topics that help me, the prompter understand the problem more. Prompts in this way are entirely related to context (pre-knowledge) that is not available to the LLMs.

ekjhgkejhgk 2 days ago

If a person writes code, should all the process be part of the commit?

stopthe 2 days ago

No. Even further than that, maintaining AGENTS.md and the like in your company repo, you basically train your own replacement. Which replacement will not be as capable as you in the long run, but few businesses will care. Anyway having some representation of an employee's thinking definitely lowers cost of firing that employee.

That is a cynical take and not very different from an advice to never write any documentation, or never help your teammates. Only that resemblance is superficial. In any organization you shouldn't help people stealing you time for their benefit (Sean Goedecke calls them predators https://www.seangoedecke.com/predators/).

On the other hand, it may be beneficial to privately save CLAUDE.md and other parts of persistent context. You may gitignore them (but that will be conspicuous unless you also gitignore .gitignore) or just load them from ~/.claude

I expect an enterprise version of Claude Code that will save any human input to the org servers for later use.

nautilus12 3 days ago

This would just record a lot of me cursing at and calling the AI an idiot.

hsuduebc2 3 days ago

I must say that would certainly show some funny converstaions in a log.

stubbi 3 days ago

Isn’t that what entire.io, founded by former GitHub CEO, is doing?

grahar64 3 days ago

If AI could reliably write good code then you shouldn't need to even commit the code as the general rule is you shouldn't commit generated code. Commit the session when you don't need to commit the code

hirako2000 3 days ago

What's the value given answers are not deterministic.

spion 3 days ago

A summary of the session should be part of the commit message.

SamDc73 3 days ago

pre-ai if I had to include Google search queries in a commit, I’d be so embarrassed I’d probably never commit code like ever

jes5199 3 days ago

instead of committing code, we should just save videos of all of the zoom meetings about the code

anishgupta 3 days ago

isn't a similar thing done by entire cli? the startup which raised $60M seed recently

Jach 3 days ago

In general, no, but sometimes, yes, or at least linked from the commit the same way user stories/issues are. Admittedly the 'sometimes' from my perspective is mostly when there's a need to educate fellow humans about what's possible or about good prompt techniques and workarounds for the AI being dumb. It can also reveal more of x% by AI, y% by human by for example diffing the outputs from the session against the final commits.

alansaber 2 days ago

If the full session capture is not encoded s.t it provides insight into architecture/mistakes, what was the point? There needs to be 1. complete capture (all tool calls etc) as well as 2. which is also curated to be readable (collapsible, chronological, easy to navigate etc). A .txt dump of agent COT is not particularly useful to anyone aside from another agent.

dekken_ 2 days ago

> AI writes code

you mean plagiarism?

lsc4719 3 days ago

Proof sketch is not proof

wakawaka28 2 days ago

Absolutely not. It's a huge waste of resources, even for machines. We don't commit every false start that humans make either. If you MUST think like this, what you really want to do is have a two-phase generation process. Generate the SPEC in one session, then try to get an AI to compile it to code. However, our LLMs are not set up to be deterministic, so there is little benefit to doing this. There are also many tweaks that people want to make which are stylistic or nitpicky, which have nothing to do with the quality of the original spec. You don't actually want to document EVERYTHING unless you work in aerospace or something.

exfalso 2 days ago

Nope. Especially with these agents the thinking trace can get very large. No human will ever read it, and the agent will fill up their context with garbage trying to look for information.

I understand the drive for stabilizing control and consistency, but this ain't the way.

weli 2 days ago

I agree so much

flammafex 2 days ago

No. Make me.

foamzou 3 days ago

No. Prompt-like document is enough. (e.g. skills, AGENTS.md)

vpribish 2 days ago

Mostly that’s going to be noise. But in some rare occasion I could see it being useful. So my unhelpful notion is that we might need a new thing - leave the commit message as a meaning- dense human-to-human message, and also have a development process flight-recorder log stored alongside. Storage is basically free so why not?

adampunk 2 days ago

I just cannot for the life of me understand the problem that this is solving. The only way that makes any sense is if sessions are atomic along with commits. If a session results in many commits in this becomes a fundamentally incomplete record, such as it was a record at all. Even if we do restrict to one session per commit, we are not in control over the agent’s context—-the session details will contain the user prompting the actions and the reasoning summaries. It will not contain a crucial part, which is how the agent assembles information about the project. So you’re left with a record that looks very complete and is silently incomplete. I don’t understand what the benefit of retaining that is.

dizlexic 2 days ago

westurner 2 days ago

"Pulp Project Policy on AI Generated Content / AI Assisted Coding" https://github.com/pulp#pulp-project-policy-on-ai-generated-... :

> [...]

> All contributors must indicate in the commit message of their contribution if they used AI to create them and the contributor is fully responsible for the content that they submit.

> This can be a label such as `Assisted By: <Tool>` or `Generated by: <Tool>` based on what was used. This label should be representative of the contribution and how it was created for full transparency. The commit message must also be clear about how it is solving a problem/making an improvement if it is not immediately obvious.*

From "Entire: Open-source tool that pairs agent context to Git commits" (2026) https://news.ycombinator.com/item?id=46964096 :

> But which metadata is better stored in git notes than in a commit message? JSON-LD can be integrated with JSON-LD SBOM metadata

FpUser 2 days ago

I keep trunk of conversation internally. No way I am putting it on github. The way I think, plan, interrogate LLM is part of competitive advantage in the market. I consider it my property and I would not ever let my clients read it (I pay for my usage of AI). Never mind some juicy language and being super straight and apolitical in a corporate sense. basically would be a major privacy breach

ETH_start 2 days ago

In principle, the documentation that's included in the code edit should have all the relevant information that a future agent would need.

igetspam 3 days ago

Yes.

EOM

est 3 days ago

obligatory: git notes

Lots of comments mentioned this, for those who aren't aware, please checkout

Git Notes: Git's coolest, most unloved feature (2022)

https://news.ycombinator.com/item?id=44345334

I think it's a perfect match for this case.

raggi 3 days ago

nope. Someones going to leak important private data using something like this.

Consider:

"I got a bug report from this user:

... bunch of user PII ..."

The LLM will do the right thing with the code, the developer reviewed the code and didn't see any mention of the original user or bug report data.

Now the notes thing they forgot about goes and makes this all public.

neomantra 2 days ago

I agree with you, but also we will start sharing these conversation traces more and more. That's why it is important for redaction to be in the export pipeline. There can be both deterministic (eg regex) and LLM-based redaction.

toddmorrow 2 days ago

yep. but I don't know what folder. maybe under logs. it's really a new category

nicman23 2 days ago

no and neither should be the actual code. you should at least remove the excessive bs that the ai comments and autisms about

trailblaze 2 days ago

[dead]

hackersk 3 days ago

[dead]

umairnadeem123 3 days ago

IMO this is solving the wrong problem. the session log is just noise - its like attaching your google search history to a stackoverflow answer to "prove" you did the research. nobody wants to read 500 lines of an agent going back and forth debugging a race condition.

the actual problem is that AI produces MORE code not better code, and most people using it aren't reviewing what comes out. if you understood the code well enough to review it properly you wouldn't need the session log. and if you didn't understand it, the session log won't help you either because you'll just see the agent confidently explaining its own mistakes.

> have your agent write a commit message or a documentation file that is polished and intended for consumption

this is the right take. code review and commit messages matter more now than they ever did BECAUSE there's so much more code being generated. adding another artifact nobody reads doesn't fix the underlying issue which is that people skip the "understand what was built" step entirely.

lidn12 2 days ago

[dead]

pipejosh 2 days ago

I settled on a similar workflow but across two agents instead of one session.

One agent writes task specs. The other implements them. Handoff files bridge the gap. The spec IS the session artifact because it captures intent, scope, and constraints before any code gets written.

The plan.md approach people are describing here is basically what happens naturally when you force yourself to write intent before execution.

adrian-vega 2 days ago

[dead]