Hacker News

237 points by RebelPotato 22 hours ago | 86 comments

> You could take an editor session, a diff, or a pull request and automatically split it into a series of more focused commits that are easier for people to review. This is one of the cases where the AI can reduce human review labor

I feel this should be a bigger focus than it is. All the AI code review start up are mostly doing “hands off” code review. It’s just an agent reviewing everything.

Why not have an agent create a perfect “review plan” for human consumption? Split the review up in parts that can be individually (or independently) reviewed and then fixed by the coding agent. Have a proper ordering in files (GitHub shows files in a commit alphabetically, which is suboptimal), and hide boring details like function implementations that can be easily unit tested.

wazHFsRy 10 hours ago

> Why not have an agent create a perfect “review plan” for human consumption? Split the review up in parts that can be individually (or independently) reviewed and then fixed by the coding agent. Have a proper ordering in files (GitHub shows files in a commit alphabetically, which is suboptimal), and hide boring details like function implementations that can be easily unit tested.

Yes exactly! I have been using this to create a comment on the PR, showing suggested review order and a diagram of how changes relate to each other. And even this super simple addition has been very helpful for code review so far!

(more on this: https://www.dev-log.me/pr_review_navigator_for_claude/)

kloud 9 hours ago

Exactly this, existing code review tools became insufficient with the increase of volume of code, I would like to see more innovation here.

One idea that comes to mind to make review easier would be to re-create commits following Kent Beck's SB Changes concept - splitting structure changes (tidying/refactoring) and behavior changes (features). The structure changes could then be quickly skimmed (especially with good coverage) and it should save focus for review of the behavior changes.

The challenge is that it is not the same as just committing the hunks in different order. But maybe a skill with basic agent loop could work with capabilities of models nowadays.

WilcoKruijer 8 hours ago

I experimented with a command for atomic commits a while ago. It explicitly instructed the agent to review the diff and group related changes to produce a commit history where every HEAD state would work correctly. I tried to get it to use `git add -p`, but it never seemed to follow those instructions. Might be time for another go at this with a skill.

telotortium 11 hours ago

Unfortunately GitHub doesn’t let you easily review commits in a PR. You can easily selectively review files, but comments are assumed to apply to the most recent HEAD of the PR branch. This is probably why review agents don’t natively use that workflow. It would probably not be hard to instruct the released versions of Opus or Codex to do this, however, particularly if you can generate a PR plan, either via human or model.

CuriouslyC 8 hours ago

I've been talking about having AI add comments to PRs to draw attention to things that should be given special attention since last May. I think most code review tools don't do this because A/B testing has shown people engage less/churn more with noisier review output.

SatvikBeri 11 hours ago

I do this. For example, the other day I made a commit where I renamed some fields of a struct and removed others, then I realized it would be easier to review if those were two separate commits. But it was hard to split them out mechanically, so I asked Claude to do it, creating two new commits whose end result must match the old one and must both past tests. It works quite well.

zmj 9 hours ago

I like this thought. Scaling review is definitely a bottleneck (for those of us who are still reading the code), and spending some tokens to make it easier seems worthwhile.

jasonjmcghee 11 hours ago

Yes please. There are many use cases where failure modes are similar to not using AI at all, which is useful.

Many very low risk applications of AI can add up to high payoff without high risk.

jonfw 10 hours ago

“I have a PR from <feature-branch> into main. Please break it into chunks and dispatch a background agent to review each chunk for <review-criteria>, and then go through the chunks one at a time with me, pausing between each for my feedback”

andai 17 hours ago

I wonder if the problem of idle time / waiting / breaking flow is a function of the slowness. That would be simple to test, because there are super fast 1000 tok/s providers now.

(Waiting for Cerebras coding plan to stop being sold out ;)

I've used them for smaller tasks (making small edits), and the "realtime" aspect of it does provide a qualitative difference. It stops being async and becomes interactive.

A sufficient shift in quantity produces a phase shift in quality.

That said, the main issue I find with agentic is my mental model getting desynchronized. No matter how fast the models get, it takes a fixed amount of time for me to catch up and understand what they've done.

The most enjoyable way I've found of staying synced is to stay in the driver's seat, and to command many small rapid edits manually. (i.e. I have my own homebrew "agent" that's just a loop of, I prompt it, it proposes edits, I accept or edit, repeat.)

So then the "synchronization" of the mental state is happening continuously, because there is no opportunity for desynchronization. Because you are the one driving. I call that approach semi-auto, or Power Coding (akin to Power Armor, which is wielded manually but greatly enhances speed and strength).

rubenflamshep 10 hours ago

> That said, the main issue I find with agentic is my mental model getting desynchronized. No matter how fast the models get, it takes a fixed amount of time for me to catch up and understand what they've done.

This is why I'm so skeptical of anyone running 6+ Claude sessions at a time. I've gotten to 5 but really that was across 3 sessions with 2 standing by just to commit stuff. And even with just 3 sessions I constantly lost where I was and wasted time re-orienting myself, doing work in the wrong session, etc.

>The most enjoyable way I've found of staying synced is to stay in the driver's seat, and to command many small rapid edits manually.

Same, there's a fantastic flow state/momentum I can get in a single session just knocking off features. I don't mind switching between two sessions in this state but the experience is better when it's two different projects vs two different features on the same project. The complete context switch lets be re-orient more easily

resize2996 10 hours ago

Warning: I was in two different project experimenting with similar forms of db access at the same time. don't do that.

sourabhrakhya 6 hours ago

same

port11 5 hours ago

Waiting on AI is its own category, so I’m not entirely sure what ‘idle time’ means. Of course we could just go and read that study…

dybber 17 hours ago

You still have to synchronize with your code reviewers and teammates, so how well you work together in a team becomes a limiting factor at some point then I guess.

tuhgdetzhh 14 hours ago

Yes, and that constraint shows up surprisingly early.

Even if you eliminate model latency and keep yourself fully in sync via a tight human-in-the-loop workflow, the shared mental model of the team still advances at human speed. Code review, design discussion, and trust-building are all bandwidth-limited in ways that do not benefit much from faster generation.

There is also an asymmetry: local flow can be optimized aggressively, but collaboration introduces checkpoints. Reviewers need time to reconstruct intent, not just verify correctness. If the rate of change exceeds the team’s ability to form that understanding, friction increases: longer reviews, more rework, or a tendency to rubber-stamp changes.

This suggests a practical ceiling where individual "power coding" outpaces team coherence. Past that point, gains need to come from improving shared artifacts rather than raw output: clearer commit structure, smaller diffs, stronger invariants, better automated tests, and more explicit design notes. In other words, the limiting factor shifts from generation speed to synchronization quality across humans.

hibikir 10 hours ago

I've seen this happen over and over again well before LLMs, when teams are sufficiently "code focused" that they don't care much at all about their teammates. The kind that would throw a giant architectural changes over a weekend. You then get to either freeze a person for days, or end up with codebases nobody remembers, because the bigger architectural changes are secret.

With a good modern setup, everyone can be that "productive", and the only thing that keeps a project coherent is if the original design holds, therefore making rearchitecture a very rare event. It will also push us to have smaller teams in general, just because the idea of anyone managing a project with, say, 8 developers writing a codebase at full speed seems impossible, just like it was when we added enough high performance, talented people to a project. It's just harder to keep coherence.

You can see this risk mentioned in The Mythical Man Month already. The idea of "The Surgery Team", where in practice you only have a couple of people truly owning a codebase, and most of the work we used to hand juniors just being done via AI. It'd be quite funny if the way we have to change our team organization moves towards old recommendations.

andai 10 hours ago

I've mostly done solo work, or very small teams with clear separation of concerns. But this reads as less of a case against power coding, and more of a case against teams!

EdNutting 13 hours ago

This thread seems to have re-identified Amdahl’s law in the context of software development workflow.

Agentic coding is only speeding up or parallelising a small part of the workflow - the rest is still sequential and human-driven.

james_marks 9 hours ago

This is 100% the new bottleneck. We’re going to see a lot agentic QA, E:E testing, etc soon for this reason.

cyanydeez 10 hours ago

And its abstracted as

Mythical Man Month -> Mythical Agent Swarm

zozbot234 13 hours ago

You can ask the agent to reverse engineer its own design and provide a design document that can inform the code review discussion. Plus, hopefully human code review would only occur after several rounds of the agent refactoring its own one-shot slop into something that's up to near-human standards of surveyability and maintainability.

Insanity 21 hours ago

Post had nothing to do with Haskell so the title is a bit misleading. But rest of article is good, and I actually think that Agentic/AI coding will probably evolve in this way.

The current tools are the infancy of AI assisted coding. It’s like the MS-DOS era. Over time maybe the backpropagating from “your comfort language” to “target language” could become commonplace.

josephcsible 20 hours ago

> Post had nothing to do with Haskell so the title is a bit misleading.

To be fair, that's not part of the article's title, but rather the title of the website that the article was posted to.

Insanity 20 hours ago

I know, but that's not typically how you see titles posted here. I'm just disappointed as I enjoy writing Haskell. :)

nakedneuron 11 hours ago

Agree. Gist of the FA is about "calm technology". Title should reflect it better.

Also agree on everything author mentions. I can't attest to all examples but I know what a UI is.

Author mentions center of focus of attention. We should hear more often about the periphery of our attention field. Its bandwidth so to speak is a magnitude lower compared to the center but it's still there and can guide some decisions quite unintrusively to flow.

(Major) eye movements are a detriment to attention, which itself should be treated like a commodity (in case of a UI thousands use, moreso like a borrowed commodity).

yoyohello13 19 hours ago

I was excited to see a non-AI article on this site for once. Oh well.

It was a good article though

lordgrenville 16 hours ago

Agreed. This website seems to prepend the blog name to each page's document.title

Would suggest that one of the mods remove it

ipnon 21 hours ago

Programming languages are most interesting area in CS for the next 10 years. AI need criteria for correctness that can't be faked so the boundary between proof verification and programs will become fuzzier and fuzzier. The runtimes also need support for massively parallel development in a way that is totally unnecessary for humans.

shevy-java 14 hours ago

Is the article good? I found it of a surprisingly poor quality. Is my assessment incorrect? Basically it is an article that tries to convince people of how relevant AI is nowadays. I don't really see it like that at all and none of the "arguments" I found convincing.

kstenerud 17 hours ago

What I've found is that most people who dislike the chat interface aren't using it in a way that leverages its strengths.

Up until recently, LLMs just plain sucked. You'd set them on a task and then spend hours hand-holding them to output something almost correct.

Nowadays you can have a conversation with the chatbot, hash out a design, rubber duck and discuss what-ifs until you have a solid idea of the thing you're building, codified in a way an agent could understand, and now you have a PLAN.

From there, it's a matter of setting the agent in motion and checking from time to time to make sure it's not getting stuck on something under-specified.

That said, I've found that this kind of workflow works a lot better with claude than with gemini.

wazHFsRy 18 hours ago

I have the same feeling recently that we should focus more on using AI to enable us, to empower us to do the important things. Not take away but enhance, boring , clear boilerplate yes, design decisions no. And making reviewing easier is a perfect example of enhancing our workflow. Not reviewing for us, but supporting us.

I am recently using this tiny[1] skill to generate an order on how to review a PR and it has been very helpful to me.

https://www.dev-log.me/pr_review_navigator_for_claude/

clarity_hacker 7 hours ago

The review ordering problem is less about AI generation and more about graph traversal. Every diff already encodes a dependency graph (imports, function calls, type references), and optimal review order is just a topological sort weighted by cognitive load. GitHub's alphabetical ordering is zero-information when we have the full call graph. The hard part isn't producing the ordering — it's efficiently extracting and maintaining the dependency graph as code changes.

totaa 7 hours ago

any extensions / third-party tools to visualise this?

eigenblake 17 hours ago

I have been considering what it would be like to give each function name a specific color and a color for each variable's type followed by a color derived from the hash of the symbol name and keywords would each be their specific type. And essentially printing a matrix of this, essentially transforming your code into a printable matrix "low-lod" or "mipmap" form. This could be implemented like the VSCode minimap but I the right move here is to implement it as a hook that can modify the output of your agent. That way you can look at the structure of the code without reading the names in particular.

nakedneuron 11 hours ago

Great idea. As a "visual type" this would be so much more intuitive to decipher. I prefer TUIs over GUI exactly because they're simpler and work hard to focus on the essential. This is low hanging fruit to enhance TUIs.

tossandthrow 16 hours ago

I whole heartedly prefer chat interfaces over inline ai suggestions.

I find the inline stuff so incredibly annoying because they move around the text I am looking at.

coffeefirst 16 hours ago

Same! It feels like being shouted at nonstop by an overeager teacher's pet who's wrong 60% of the time.

I do appreciate in-IDE functionality that can search the codebase etc etc, but I want to hit a button when I need it.

benob 15 hours ago

I really like the "file lens" example:

> “Focus on…” would allow the user to specify what they're interested in changing and present only files and lines of code related to their specified interest.

> “Edit as…” would allow the user to edit the file or selected code as if it were a different programming language or file format.

matheus-rr 15 hours ago

The "junior dev" analogy is the one I keep coming back to, but the part people miss is the review surface area problem.

When a human junior writes code, they leave breadcrumbs of their thinking — commit messages, PR descriptions, comments explaining why they chose approach A over B. You can reconstruct their reasoning from the artifact trail.

Agents don't do this naturally. You get a diff with no context for why it went that direction. So the reviewer has to reverse-engineer the thinking from the code alone, which is actually harder than reviewing human code because there are no "tells" — no familiar coding style, no consistent patterns that hint at the developer's mental model.

The semi-auto approach mentioned upthread works precisely because it solves this: you were there for every decision, so there's nothing to reconstruct. The productivity loss from staying in the loop is offset by the time you save not having to audit opaque changes after the fact.

wazHFsRy 10 hours ago

Also with your real junior dev you build trust over time. With the agent I start over at a low trust level again and again so far.

camgunz 12 hours ago

The only way AI companies can recover their capex is to replace workers. That's why their interfaces are only facially built for the workers they're replacing (engineers, finance, etc) and why this is a non-starter: it totally undermines the business model.

danielvaughn 9 hours ago

Generally agree with the idea of calm technology, but I feel like inlay hints are a bad example. They actively give me anxiety because it makes the code feel harder to read, it takes my attention away from the code, and it feels more awkward to edit the text because you have these virtual characters getting in the way and having to re-render as you type, causing a shift in your cursor position. It's not at all calming for me, lol.

deanc 9 hours ago

All the problems highlighted with agentic coding are problems you face when working as a team of humans. Apply the same principles:

- Break down big problems into smaller ones

- Create extensive plan + documentation (context)

- Make sure some parts of the plan if possible can be done simultaneously and not create too many dependencies.

- Define success criteria (tests?)

Then just unleash the agents. The more you put in, the more you get out.

agnishom 12 hours ago

This is an amazing article. The HN title should be edited a bit. "Calm Technology - Beyond Agentic Coding"

nakedneuron 11 hours ago

Hard agree.

Narciss 10 hours ago

I did a bit of digging into why you think agentic coding is “not there yet”, and I think you are bashing a tool you have very little experience with and are using a bit wrongly.

Nothing wrong with that, except that as opposed to any other tool that is out there, agentic coding is approached by smart senior engineers that would otherwise spend time reading documentation and understanding a new package/tool/framework before giving conclusions around it with “I spun up Claude code and it’s not working”. Dunno why the same level of diligence isn’t applied to agentic coding as well.

First question that I always have to such engineers is “what model have you tried?” And it always ends up being the non-SOTA models for tasks that are not simple. Have you tried Claude Opus?

Second question: have you tried plan mode?

And then I politely ask them to read some documentation on using these tools, because the simplicity of the chat interface is deceptive.

jmull 10 hours ago

It doesn't look like you addressed issues raised in the article. E.g., see the "my experiences interviewing candidates" section where we can see this isn't just a problem of the author's (just one example in one section of an article that covers various things).

I always wonder what the purpose of posting these generic, superficial defenses of a certain form of LLM-based coding is?

Narciss 9 hours ago

That was a different matter altogether. I agree though that I didn't touch on that.

My experience is different in that case, but it certainly depends on the type of technical challenge, the programming language, etc.

Candidates that perform better or worse exist with and without agentic coding tools. I've had positive and negative experience on both fronts, so I'd attribute the OP's experience to the N=1 problem, and perhaps to the model's jagged intelligence.

I work mostly in Typescript, and it's well known that models are particulary well versed in it. I know that other programming languages are less supported because the training data for them is lower, in which case models could be worse with them across the board (or some SOTA models could be better than others)

zazibar 10 hours ago

A "you're holding it wrong" with the implication that the author is a bad engineer as the cherry on top. Brilliant stuff.

Narciss 9 hours ago

Definitely didn't want to imply that the author is a bad engineer, quite the contrary he seems like a very good one. Apologies if it came across that way.

Just that many brilliant engineers as themselves test agentic tools without the same level of thorough understanding that they give to other software engineering tools that they test out.

mjburgess 10 hours ago

VibeTFM

Narciss 9 hours ago

hmm didn't get the pun...Time to turn to chatGPT :))

roughly 17 hours ago

The “Calm technology” thing always annoys me, because it skips every economic, social, and psychological reason for the current state of affairs and presents itself as some kind of wondrous discovery, as opposed to “the way things were before we invented the MBA.” A willing blindness to predators doesn’t provide a particularly useful toolkit.

pringk02 15 hours ago

I would be interested to hear you elaborate on this more. I feel like I almost get what you are saying but am not confident I actually understand.

roughly 4 hours ago

Yeah, so - the whole Calm Technology(™) feels like someone looked at the dopamine casino of modern tech and said "well, this is all wrong" - which, yes - and then proceeded to try to treat it like a design problem, which it is emphatically not. Not only are the people who made the dopamine casino aware of what makes "calm technology"(™), they're experts in it, because the entire design process of most modern tech is explicitly designed not to be "calm," because the entire economic incentive structure is pushing dopamine casinos. People aren't building "uncalm" technology by mistake, they're building it because the modern tech business structure and environment rewards addictive software.

If the "Calm Tech"(™) people/institute/whatever actually wanted to move the needle, they'd be lobbying for regulations, building tools for consumers to fight back, or trying to do anything at all that actually shifts the underlying institutional and incentive structures. As it sits, they're the equivalent of a recess monitor suggesting maybe the bully would be happier if he shared the toys with the other kids - and frankly, given the degree of branding around the whole thing, it all starts to smell more like "influencer" than "genuine attempt to improve technology."

cess11 13 hours ago

"I allow interview candidates to use agentic coding tools and candidates who do so consistently performed worse"

I have a similar impression. It seems to me that people get something that kind of works and then their interest runs out and they're left with a shallow understanding of the result and how it might be achieved. This seems detrimental to learning, which tends to happen when one is struggling.

"I strongly believe that chat is the least interesting interface to LLMs"

This is also something I agree with. When I work with databases, the best part is not sitting with an immediate client writing raw queries by hand.

AIorNot 15 hours ago

“Facet-based project navigation You could browse a project by a tree of semantic facets. For example, if you were editing the Haskell implementation of Dhall the tree viewer might look like this prototype I hacked up2”

^ This is a genius idea - someone add this to claude

plaguuuuuu 13 hours ago

At work we use Clean Architecture which is incredibly hard to browse, even though I've been there for 6+ months now and know where everything is, I have to use so much working memory to gather together the files for a feature slice (endpoint, command, command handler, etc).

I've thought for a while of building this exact thing as a vscode extension because of how utterly shit it is :D

I really want the source code!

Gabriel439 8 hours ago

Author here: the source code is linked in the post but it can be easy to miss: https://github.com/Gabriella439/facet-navigator

It's very rough, but I plan on cleaning it up soon (the cluster labeler still needs a lot of work) and writing another post about it soon

resize2996 10 hours ago

find the reflections in the rushing river

brightstep 12 hours ago

> A tool is not meant to be the object of our attention; rather the tool should reveal the true object of our attention (the thing the tool acts upon), rather than obscuring it

I think this is true of AI agents. What is the object of our engineering attention? Applications, features, defect resolution. Not code.

Gabriel439 8 hours ago

Author here: yeah, this is a good point and something I think about even outside the context of agentic coding.

I've also tinkered with this idea myself in the context of prompt engineering with my Grace Browser project (https://trygrace.dev/), which converts code to an equivalent dynamic web form live within your browser.

I do think it's useful to remember that code is not the end goal and is itself just another mediated interface to the actual goal: the product your building. However, I think even if you cut code out of the picture the chat interface is still not necessarily the right interface for building a product. A great example of how to build a non-chat interface to product building (predating the AI boom) is Bret Victor's Inventing on Principle talk (https://www.youtube.com/watch?v=PUv66718DII) and there might be ways we can refresh the ideas in that talk for the AI era (although I still don't have any specific thoughts along those lines yet).

brightstep 6 hours ago

Totally agree about the chat interface. I like to say it’s “infinitely powerful and infinitely confusing.” A dangerous combination. And, arguing with myself, I think it’s fair to say the code is AN object of our attention, if not THE object. A common metaphor being applied to agentic coding is the invention of power tools. If AI is the drill, and the goal is a house, then the code is the framing.

OutOfHere 21 hours ago

Agentic coding doesn't make any sense for a job interview. To do it well requires a detailed specification prompt which can't reliably be written in an interview. It ideally also requires iterating upon the prompt to refine it before execution. You get out of it what you put into it.

XenophileJKO 18 hours ago

As someone that agenticly codes A LOT. Detailed specs are not required, but certainly one way to use the systems.

If you are going to do a big build out of something, spec up front at least to have a clear idea of the application architectural boundaries.

If you are adding features to a mature code base, then the general order of the day is: First have the Ai scout all the code related to the thing you are changing. Then have it give you a summary of its general plan of action. Then fire it off and review the results (or watch it, less needed now though).

For smaller edits or even significant features, I often just give it very short instructions of a few sentences, if I have done my job well the code is fairly opinionated and the models pick up the patterns well and I don't really have to give much guidance. I'll usually just ask for a few touchups like introdusing some fluent api nicities.

That being said, I do tend to make a few surgical requests of the AI when I review the PR, usually around abraction seams.

(For my play projects I don't even look at the code any more unless I hit a wall, and I haven't really hit a wall since Opus 4.5, though I do have a material physics simulator that Opus 4.5 wrote that runs REALLY slow that I should muck around in, but I'm thinking of seeing if Opus 4.6 can move it to the GPU by itself first.)

So if I were doing an interview with an interview question. I would probably do a "let's break down what we know", "what can we apply to this", "ok. let's start with x" and then iterate quickly and look at the code to validate as needed.

OutOfHere 4 hours ago

There is a real danger here during an interview of unfairly imposing one's style on others. I think it's great to share one's approach, but making it the only approach can lead to stagnation and lose out on picking ideas from alternatives.

zarzavat 20 hours ago

In the UK the driving test requires a portion of driving using a satnav, the idea being that drivers are going to use satnavs so it's important to test that they know how how to use them safely.

The same goes for using Claude in a programming interview. If the environment of interview is not representative of how people actually work then the interview needs to be changed.

shash 18 hours ago

In the Before Times we used to do programming interviews with “you can use Google and stack overflow” for precisely this reason. We weren’t testing for encyclopaedic knowledge - we were testing to see if the candidate could solve a problem.

But the hard part is designing the problem so that it exercises skill.

adhamsalama 18 hours ago

We don't solve LeetCode for a living yet it is asked in interviews anyway, so nah, we don't have to use AI in interviews.

EdNutting 13 hours ago

You’ve just written the exact reason LeeteCode is widely mocked as an interview technique. They are not representative of most real world software, and engineers that train to solve them give a false impression of their ability to solve most other problems.

I’ve interviewed hundreds of engineers for software and hardware roles. A good coding test is based on self-contained problems that the team actually encountered while developing our product. Boil the problem down to its core, create a realistic setup that reflects the information the team had when they encountered the challenge, and then ask the candidate to think it through. It doesn’t matter if they only write notes or pseudo code, and it doesn’t matter if they reach the wrong conclusion. What it’s testing for is the thought process. The fact the candidate has to ask the interviewer questions as though the interviewer is effectively the IDE, is great! The interviewer experiences the engineer’s thought process first-hand. And the interviewer can nudge the candidate in the correct direction by communicating answers that aren’t just typical IDE error messages.

To validate these kinds of questions in advance, I’d often run them on existing team members that hadn’t already been exposed to the real challenge the problem was based on.

bitwize 5 hours ago

Leetcode's utility is not in showing you can solve real-world problems. It's used as a baseline to estimate how smart you are. Every shop prides itself on hiring smart people, and some only want the best of the best—your MIT and Stanford grads, etc. A smarter engineering workforce can not only solve the problems you have, they're better positioned to spot and avoid problems you haven't anticipated yet. Anyways, IQ testing as a condition of employment can open you up to legal liability, as IQ tests are horribly racist. Leetcode is a way around that.

tptacek 3 hours ago

Without commenting on the racial biases of IQ tests (we probably directionally agree), the idea that IQ tests in employment are legally risky is an Internet myth. The companies that offer employment-screening general cognitive tests have logo crawls of giant companies that use them.

They're not unusual because they're legally risky; they're unusual because they don't work well.

simonw 20 hours ago

How about bug fixing? Give someone a repo with a tricky bug, ask them to figure it out with the help of their coding agent of choice.

OutOfHere 14 hours ago

It doesn't have to be a "tricky" bug. A straightforward bug will do. If it's too tricky, the logic could be better off being rewritten.

charcircuit 20 hours ago

>which can't reliably be written in an interview

Why not? It sounds like a skill issue to me.

>It ideally also requires iterating upon the prompt to refine it before execution.

I don't understand. It's not like you would need to one shot it.

OutOfHere 13 hours ago

It's a time issue. Interviews hardly offer much time as it is. To ask for something that benefits from multiple iterations is probably not going to fit in the available time.

Zakodiac 19 hours ago

[dead]

wazHFsRy 18 hours ago

> Honestly the model that works best for me is treating agents like junior devs working under a senior lead. The expert already knows the architecture and what they want. The agents help crank through the implementation but you're reviewing everything and holding them to passing tests. That's where the productivity gain actually is. When non-developers try to use agents to produce entire systems with no oversight that's where things fall apart.

I tried to approach it that way as well, but I am realizing when I let the agent do the implementation, even with clear instructions, I might miss all the “wrong“ design decisions it takes, because if I only review and do not implement I do not discover the “right“ way to build something. Especially in places where I am not so familiar myself — and those are the places where it is most tempting to rely on an agent.

hibikir 18 hours ago

With Claude code I live in plan mode, and ask it to hand me the implementation plan at the low level, with alternatives. It's rare for it to not give me good ones: Better than the junior dev. Then the plan it has is already limited enough that, along with its ability to maintain code style, I will see code very similar to what I would have done. There are a couple of things in the .md file to try to make it take a step or two like the ones I would on naming, shrinking the diff, and refactoring for deduplication. It's not going to go quite as fast as trusting it all to work at a large scale, but it sure looks like my code in the end.

Zakodiac 15 hours ago

[dead]

refactor_master 11 hours ago

It’s interesting then to ask if this will behave the same as big orgs? Eg once your org is big and settled, anything but the core product and adjacent services become impossible, which is why 23 often see a 50-person company out-innovating a 5k person company in tech (only to be bought up and dismantled, of course, but that’s besides this point).

Will agents simply dig the trenches deeper towards the direction of the best existing tests, and does it take a human to turn off the agent noise and write code manually for a new, innovative direction?

wazHFsRy 14 hours ago

I totally get your point and agree to an extends, though I have not yet been able to create that trust with the LLM. With human teams, yes, with LLMs, feels like I still have to verify too much.

kittbuilds 7 hours ago

[dead]

shevy-java 14 hours ago

> I believe there is a lot of untapped potential in AI-assisted coding tools

Yikes.

By the way, the whole website is strange. Just the name alone "haskell for all".

Many years ago when I tried to learn Haskell (and wrote some haskell code that worked but it was sooooo much harder when compared to ruby or python), one of the few things that appeared early on, aside from the monad barrier, was that many haskell people said that Haskell is deliberately not for everyone. Back then this was when IRC was still en vogue, so I "heard" that via various discussions on #haskell.

I did not fully understand this part, because ... why would you write a language that only a few big brain people could use? I found that elitistic and snobbish, even arrogant.

Only at a later time did I understand one part of the meaning. The "we don't want you here" also means "we don't want YOU to change haskell into some other new meta-variant". I understood this much better when some guys wanted to have ruby embrace types. Then I understood that people not only want to change a language but also want to ruin it; whether on purpose or because they prefer something else (such as their brain embraced types-only code bases) is a separate discussion. I still find the haskell attitude very elitistic but I at the least understand that they don't want everyone to use - and change - Haskell.

> For example, someone who was new to Haskell could edit a Haskell file “as Python” and then after finishing their edits the AI attempts to back-propagate their changes to Haskell.

I like the general idea behind "write in any language, have it work in EVERY language". But the whole AI movement seems more about trying to dumb down people really or make them lazy, in many ways. I have seen people use it to great effect, so I am not at all saying AI has no use cases. What I am however had noticing is that it made many normal folks super-lazy. They type on their smartphone, solution comes out, task finished, move on. That's not necessarily only bad, but it comes with trade-offs. My approach is much slower, but it is systematic and I am in full control of what is documented how and where.

> This is obviously not a comprehensive list of ideas, but I wrote this to encourage people to think of more innovative ways to incorporate AI into people's workflows

Oh he has achieved this in a different way. Now I have another reason to not want AI in my "workflows". The whole website also seems super-strange to me. Has he used AI to write the whole content and layout? It's hard to say because I don't know how it used to be in the past, but the paragraphs and the content seem so strange. I suspect he used AI to generate the layout too; and some of the content as well. We are losing "interaction" with real humans here too (ok ok, there is not a lot of interaction with regards to a static website, but if a blog is written by AI, then that is not really any possibility for interaction with a human - you could not even distinguish WHO wrote the content or made the decisions such as which style to choose and so forth; it looks very fake to me or, at the least, in part. I typically don't see this with other blogs.).

Gabriel439 14 hours ago

Author here: my pronouns are she/her

I did not use AI to generate my blog's content nor layout.

Also, the reason my blog is named "Haskell for all" is because I originally created my blog a long time ago to try to make Haskell more accessible to people and counter the elitist tendencies.

zeendo 9 hours ago

Her blog has lots of quality content that's been featured on HN several times - well before AI writing became a thing. Not that you should necessarily know that but just strong evidence that your intuitions here are way off on so many levels.

Your entire take is super strange and presumptive.

wazHFsRy 8 hours ago

On the flip side, this finally reads again like something that is written by a human for a human, so I'm very glad to get this kind of content.