Tech

How Developers Are Experimenting With LLMs Beyond Simple Code Completion

Rare Ivy Staff Writer · Jul 02, 2026

10 min read

How Developers Are Experimenting With LLMs Beyond Simple Code Completion

When autocomplete stops feeling like coding

The complaint that kicked off the discussion was plain enough: even with strong assistants like Claude Code and Codex, the workflow can still feel jerky. A developer described the loop as a constant stop, review, reprompt cycle. The model spits out a chunk of code, the human checks it, the human nudges it again, and the whole thing starts to resemble a chatty junior dev who keeps walking back to your desk every few minutes. Useful? Worth noting. Sure. Smooth? Not really.

That gap matters because the criticism wasn’t mainly about model quality. Nobody was saying the systems are dumb or unusable. The frustration was about the interaction pattern itself. When coding turns into a chain of tiny approvals, the engineer gets pulled out of the problem space over and over. You’re no longer just building a feature. And you’re also managing the rhythm of the assistant, which is a different job entirely. For a lot of people, that’s the part that feels off.

The issue may not be that the model can’t code. The issue is that the interface keeps asking the human to babysit the process.

That’s why the post opened the door to a bigger question: is anyone trying something genuinely different from the standard prompt-response setup? The usual pattern’s familiar by now. Ask for something. Wait. Read the answer. Correct it, and repeat. It works, up to a point, but it also bakes interruption into the workflow. The original poster seemed less interested in yet another benchmark flex and more interested in a different shape of interaction, one that doesn’t force a developer to keep resetting context just to get a sensible next step.

One of the first ideas floated in response was a tab-style interaction model. The phrasing was casual, but the instinct behind it was easy to follow. Tabs, in this sense, aren’t about browser chrome for the sake of it. They suggest a more continuous workspace, where the model can stay present while the human flips between code, notes and side tasks without starting from scratch each time. That kind of setup feels closer to how people already move through a codebase. You open a file, compare versions, jump to a test, glance at an error, come back, then keep going. The goal is to make the assistant feel less like a separate conversation and more like a persistent part of the editor.

There’s a reason that idea comes up so often in tech news and product chatter around lifestyle tech tools for developers. People don’t usually want more ceremony. They want fewer tiny frictions. If the assistant’s making you restate the same intent five times, it doesn’t matter much that the underlying model scored well or generated nice-looking code. The experience still feels choppy. And in daily use, choppy gets old fast.

A related concern showed up almost immediately: are there coding-focused models that actually get close to the best autocomplete tools? That question sounds simple, but it points to a real split in how developers judge these systems. Some care about broad reasoning, planning and agent-like behavior. Others just want the thing that fills in the next line fastest and with the least nonsense. That’s not a minor detail, if a model can write a decent multi-file patch but stumbles when you ask it to complete a half-typed function. For many people, autocomplete is still the front door.

The interesting part is that the ask wasn’t framed as “which model is smartest?” It was closer to “which one feels good to use all day?” That distinction tells you a lot about the current state of developer tools. In the same way a music app or note app can be technically fine and still annoy the people using it, coding assistants can be strong on paper and clumsy in the hand. The conversation in this thread wasn’t really about raw power. It was about whether the tool respects the pace of actual work.

Seen that way, the thread reads like a small snapshot of digital culture inside software teams. Developers aren’t just comparing outputs anymore. They’re comparing how much the tools interrupt their attention, how often they have to re-explain themselves, and whether the assistant helps preserve momentum or kills it. That’s a different bar. It also explains why the basic prompt-response model’s starting to look dated to some people, even when the underlying models keep getting better.

The next part of the discussion moved away from frustration and into tactics. But the first complaint set the tone: a lot of developers don’t want a smarter chatbot around their code. They want something that stops making them feel like they’re constantly restarting the same conversation.

Specs, skills, and the case against bulky orchestration

Once the complaints about choppy back-and-forth gave way to actual proposals, the conversation got a lot more interesting. A few developers seemed to land on the same conclusion: if an LLM keeps wandering, the first fix is usually not more orchestration. It’s a tighter brief. That’s a quieter answer than the current fever for agentic coding, but it may be the one that survives contact with real developer workflows.

One commenter described a spec format that’s almost aggressively plain. Spell out the intent, define the input and output, list the constraints, and say what has to be true before the model starts. No fluff, no heroic framing, no “just make it clean.” In practice, that means the model gets a clear target instead of a vague vibe. If you’re using AI coding tools like Claude Code or OpenAI Codex, this kind of brief can feel less like prompting and more like handing over a compact contract. The model still has room to work, but it doesn’t have to guess at what “done” means.

A model that gets a cleaner spec usually beats a model with a bigger pile of context and a more elaborate marching band around it.

That same commenter pushed for another habit that sounds obvious until you watch people skip it: let the model ask questions before it starts. If the task’s muddy, the best first pass may be a round of clarification, not code. Which files are in scope? What should happen when the input’s missing? Are there style rules, performance limits, deployment constraints, or tests that must pass before merge? Those are boring questions, but boring is often where the time savings live. A model that asks before acting will usually waste less time than one that confidently invents the wrong shape of solution.

There’s a broader point hiding in that advice. A lot of developers have spent the last year stuffing more and more context into prompts, then wondering why the model still trips over the edge cases. The cleaner move is to front-load the uncertainty. If the spec can expose ambiguity early, the rest of the session tends to go smoother. That matters in developer workflows where one wrong assumption can send you down a rabbit hole of rewrites, test failures, and the familiar “no, not that file” correction spiral. Nobody needs a model that is eager to be wrong.

Another developer in the thread took the same instinct even further and said they avoid built-in memory altogether. Their worry’s simple: memory can blur intent over time. A long-running chat starts to accumulate half-remembered preferences, stale project details and the occasional nonsense that got accepted because everyone was moving fast. Instead, they keep notes in markdown files and let the model search them when needed. That keeps the record visible and editable. If the instructions change, they change in a file, not in a foggy layer of remembered context.

That approach lines up with the broader move toward external, readable state rather than invisible behavior. The same logic shows up in Anthropic’s Model Context Protocol, which is built around giving models structured access to outside tools and data instead of making the prompt do all the work. The thread’s version was more homespun, though. Less architecture diagram, more “put the decisions in a markdown file and stop trusting your own memory at 11 p.m.” It’s hard to argue with that. Files age better than vibes.

The same commenter wasn’t impressed by hooks either. In their experience, hooks haven’t done much to improve the actual coding loop. They fire, they log, they trigger little side effects, and then the core problem is still there. Skills, by contrast, got a better review. The reason’s fairly plain: skills can add a new capability without dumping a pile of instructions into the main context every time. That means the model can pick up a repeatable procedure only when it needs it, instead of dragging the whole thing around from task to task.

GitHub’s docs draw a similar line in its own agentic workflows and agent skills material. The split is useful. Agents are mostly about preserving context across a task, so the model doesn’t lose its place halfway through a long job. Skills are about teaching it how to do a specific thing. That might mean using a template, filling out a standard form, or running code and returning only the result that matters. In other words, agents help it remember where it is, while skills help it learn what to do next.

That distinction matters because it keeps the tool from turning into a junk drawer. If every new behavior’s stuffed into the prompt or hidden behind a giant orchestration layer, the session gets heavier and the model spends more time carrying its own baggage. Skills are a cleaner bet when the task repeats and the steps are known. Agents make more sense when the work stretches across several turns and the model needs continuity. Those are different jobs, even if the marketing pages sometimes blur them together.

Google’s agentic chat pair programmer fits into the same family of experiments. So does GitHub’s coding agent for Copilot, which signals how quickly these ideas are moving from hobbyist experiments into product menus. The funny part is that the people actually using these systems don’t always sound dazzled by them. They sound selective. They want a model that can ask a sane question, follow a sharp spec, consult a markdown note, and pull in a skill when needed. That’s less glamorous than a fully autonomous code bot, sure. It’s also a lot closer to something a developer can live with for more than a week.

The search for a better interface: tabs, lazy loading, and fewer context traps

After the back-and-forth over specs, skills and whether agents are doing too much, the conversation in the thread moved toward a more awkward question: what should the interface even look like if the goal is to keep a developer in motion instead of making them babysit the model every few minutes?

A recurring answer was progressive disclosure. The idea’s simple enough. Don’t dump every file, note and dependency into the model at the start, then hope it sorts through the mess. Show context only when it matters. Let the model pull in the next piece when it actually needs it. That approach came up in a few different forms, from tab-style workflows to systems that surface the next relevant chunk of information only after the current step’s complete.

It sounds almost boring, which may be why it keeps coming back. A lot of prompt engineering today still treats context like a buffet tray. Load everything, then ask the model to make sense of the pile. For small tasks, that can work. It gets clumsy fast, for bigger software development jobs. The developer’s left shuttling files, pasting snippets and pruning old instructions so the model doesn’t trip over its own instructions. Nobody writes code for a living because they enjoy context housekeeping.

The smartest interface may be the one that asks for less all at once and knows when to wait.

One builder in the thread described a more ambitious take on that idea. They said they were working on a JSX-based templating language that could manage branching and context from either a spec or existing work. JSX is a familiar shape for many developers, so that choice makes practical sense. Instead of inventing yet another opaque orchestration layer, the system would use a structure people already recognize from frontend work and apply it to LLM programming.

The pitch here is less about cleverness and more about reducing friction. If the task branches, the template branches. If the model needs a test file, a design note, or a piece of prior code, the system can decide when to surface it. That leaves the developer with fewer manual steps and fewer moments of “wait, did I already tell it that?” On messy tasks, especially the ones that sprawl across multiple files and half-finished ideas, that sort of automation could save a lot of time otherwise spent piping context around by hand.

But the thread was hardly full of open arms for orchestration-heavy tools. Several commenters sounded skeptical, and their concern wasn’t hard to follow. Every extra planner, router, memory store and handoff layer adds more tokens, more latency, and more chances for the whole thing to wander off course. The promise’s that the system becomes smarter. The fear’s that it becomes a machine for generating overhead. If the model needs another model to tell it what to look at, and then another one to package the result, the software starts to feel less like an assistant and more like a committee.

This means that skepticism seemed to come from a very practical place. Developers are used to tools that fail in embarrassing, mundane ways. A setup can look elegant on a demo stage and still fall apart the moment a repo’s odd naming, stale docs, or a codebase held together by history and three urgent fixes. In that world, an orchestration stack has to earn its keep every single time. If it burns more tokens than it saves, people will notice. Fast.

One commenter pushed that point even further with a kind of reality check. If the current wave of orchestration really worked cleanly, the largest model makers would already be using it to replace big chunks of software work. They have the money, the talent, the compute, and the direct line to the frontier models. It’d probably be showing up at scale already, not just in experiments and side projects, if a certain interface pattern were truly the best path. That argument isn’t ironclad, of course. Large companies can be slow, cautious and weirdly attached to internal process. Still, it lands because it’s plain. The people building the models themselves are the first ones who would love a reliable shortcut.

So the thread ends up circling the same tension from different angles. Developers want systems that feel continuous, not stop-and-start. Good news. They want the model to ask for what it needs, when it needs it, without forcing a full reset of the conversation each time. And they want context that arrives just in time, not a mile before or after it’s useful. And they want that without paying for a pile of orchestration that eats up tokens and attention.

That leaves the real question hanging in the air: what would a coding interface look like if it preserved momentum instead of interrupting it every few minutes?

How Developers Are Experimenting With LLMs Beyond Simple Code Completion

When autocomplete stops feeling like coding

Specs, skills, and the case against bulky orchestration

The search for a better interface: tabs, lazy loading, and fewer context traps

Read next

Why This Family Rift Over a Baby Girl Went Too Far

What It Means When an Indian Startup Is Finally Ready to Launch

The Real Risk in AI Is Bad Governance, Not Hype

Stay in the loop

When autocomplete stops feeling like coding

Specs, skills, and the case against bulky orchestration

The search for a better interface: tabs, lazy loading, and fewer context traps

Read next

Why This Family Rift Over a Baby Girl Went Too Far

What It Means When an Indian Startup Is Finally Ready to Launch

The Real Risk in AI Is Bad Governance, Not Hype

Stay in the loop

Wait, don't go yet!

Special Offer Just for You!