How AlphaPixel Uses (And Doesn’t Use) AI In Development

Some futuristic AI/code mishmash of a computer screen and AI node/graph network graphics. And a pair of eyeglasses.

How Have AI/LLM Coding Tools Impacted Software Development?

The emergence of AI systems (as we will also refer to as Large Language Model/LLM tools) in the 2020s is one of the biggest, unanticipated, game changers in software development (and virtually every field). For decades AI research plodded along, making incremental progress with occasional practical breakthroughs, like speech to text (remember 2007’s astounding GOOG-411 that shut down by 2010?). Approaches that seemed promising never seemed to pan out into tangible gains. I even saw an “expert system” construction tool on my Amiga 500 that I could never figure out what it did ( https://archive.org/details/manualzilla-id-5642118/mode/2up ) . This all changed with a rapid asymptotic surge following the breakthroughs of Transformers and Attention, leading to the first Large Language Models that seemed to produce semi-useful output, followed quickly by more massive training and new models that could synthesize what appeared to be novel concepts that weren't just repeating pre-trained responses like a Markov Chain.

Early LLMs were terrible at programming. They had memorized how to recite Shakespeare, and could parrot out clever sounding repsonses, but the deep thought and challenging requirments of programming were hilariously out of reach initially. Over time, however, as we tested all the leading models from OpenAI, Google and Anthropic, we started to see them getting better in some domains. Python coding improved quickly, driven probably by the vast amounts of open source Python code in the world (ironically, lots of it in the Machine Learning field). But C++ still proved to be a tough nut to crack for AI tools, much as it is for human programmers as well. C and C++ have often had a fearsome reputation of giving a programmer enough rope to hang themselves, and providing an armory of "footguns", equipping an innocent coder or LLM with ample ways to shoot themselves in the foot.

But again, progress marches inevitably forward, and additional training and prompting in modern models has substantially improved the results. Agentic tools like OpenAI Codex and Anthropic's source-available-but-not-open-source Claude Code [ https://github.com/anthropics/claude-code ] which, amusingly, can be used with backends other than Anthropic's Claude model, incorporated sophiticated tool-usage and planning capabilities. These advancements allow an agentic tool to churn away unattended (if you trust it!) iteratively planning, executing, testing and validating a solution on your behalf.

Our research in these tools has revealed a mixed bag, but still a bag containing some tools and toys that any programmer would be glad to find under their tree at the holidays. This blog post isn't going to reveal any breakthroughs, but we've certainly encountered a number of developer recently who don't know the state of the art, so we'd like to make the landscape clear for all.

These tools do equivocally work, effectively, under the right circumstances. They CAN save time, reduce effort, improve quality and productivity. They can also fail spectacularly when not used properly and waste time, resources, money and cause other significant problems. Here we will outline what environments we successfully use them in, what procedures we use, and when we won't use such tools.

When are AI/LLM Tools Useful in Software Development and Programming Tasks?

Generally, when carefully guided, supervised and shepherded by an experienced pracitioner who could do the work themsevles. You can use an agentic tool to simply say "Make me a Tinder clone for cats", and it will happily grind away making a feline matchmaking website (Purrfect Match). Even your cat could vibe code that way! Cats love vibe coding.

But such tools often make spurious and ill-thought-out decisions from the architectural planning phase to the final UX design. You might unknowlingly end up with something built using an inappropriate or unsuitable language, toolkit, framework, OS/platform or architecture. Without guidance, the final product might choose to use a framework that is out of date, simply because when the LLM was trained, there was ample source material teaching and recommending that choice. LLMS still substantially are built on corpuses of text from the 2022 ERA (I joking call this B.C. for Before ChatGPT) to avoid the self-poisoning "dead Internet" problem where AIs ingest material written by other flawed AIs. As well, without guidance, you’ll tend to end up with something that is just like everything else. AI tools are notorious for mimickry, making the same designs and architectures as they have been trained on.

Another qualification for recommending AI-written code is for code that is in a non-critical-production role. Is it a one-off tool needed to address some ephemeral problem? An uncommon data conversion process or a unique transformation that won't be part of a long-term maintained pipeline? These are excellent candidates, where "did it do the one job we needed it to do one time?" is the primary and sometimes only unit test, and long-term maintenance isn't a serious concern if you don't ever plan to use the tool again. Or "What if we had a tool to convert OpenType fonts to Commodore 64 Color PETSCII data?" It would be foolish for a qualified engineer to spend effort on this, but an LLM coding tool can make your wildest Quixotic endeavours practical. And if that specialized tool saves you time, just once, it's worthwhile.

Similarly, a great use of LLM/AI coding tools is research and experimentation. It is now practical to chase down a wild hair that was simply not cost-effective previously. "What would happen if we added ability X?" Now, instead of tasking a developer for a few weeks to flesh out a feature, a proof of concept implementation can be engineered in a short time, sometimes even just a day or two. If the test proves effective, it can become a prototype for a proper development effort, and a source of unit tests, benchmarks, specifications, and trade studies for a real implementation. These LLM-generated proof-of-concept implementations are often vilified by professional programmers for being all-too-often accepted as production code. This is an anti-pattern. If a proof of concept proves its worth, the feature should then be evaluated, and a new set of specifications and requirements derived from the evaluation of the prototype. This should become the basis for a new engineering effort (it's fine if this is LLM-assisted) that proceeds from first principles afresh, incorporating all that has been learned hitherto, and proceeding through the normal development process. Let me re-iterate. Raw AI/LLM proof of concepts should not generally be put into production. Too many organizations are whiffing this barrier.

A fairly compelling and feasible use of AI that we support wholeheartedly is code auditing and bug hunting. The security news space right now is rife with CVEs and tales of LLM tools (especially the mythical Claude Mythos) wreaking havoc on codebases, finding years or even decades-old mistakes and vulnerabilities that nobody had ever seen or even looked for. The absolute tedium of auditing a massive complex codebase puts it beyond the resources of most coding/audit teams. But, given enough token budget, AI tools can just grind away like a gamer trying to earn a coveted loot, tirelessly iterating and exploring. We personally have used AI/LLM tools effectively on several untouched codebases and found significant, non-trivial, actual code errors that no static analysis had ever uncovered. I'm not referring to simple typos, but rather functional/intent implementation flaws. And, of course, instances of the dreaded copy/paste anti-pattern. The errors we've uncovered this way were very hard to see, even after being notified of them, we had to scrutinize the code to see a parenthesis that was grouping a formula wrong and disrupting the proper order of operations distribution. The key was that the LLM inferred the intent of the code from the function name/prototype and comments, understood the intent, and could see that the implementation was actually faulty, even though it was valid, compileable code that ran without fault. It just produced the wrong result.

One more place where AI usage stands out advantageously is working on open-source code. Without throwing shade, Open-source code is almost never provided enough resources to excel (not saying closed source is either, but...). It's often created and maintained by volunteers, sometimes augmented by sponsorship or collaboration from commercial partners seeking to achieve a goal that serves their interests. Almost never does a project earn full dedication and engineering resources and process from the start. Some efforts simply never are achieved because of a lack of them being funded or fun. As a result, there are many opportunities to employ LLMs to scale effort in the direction of code auditing, test generation, and other development quality. It is worth noting that in 2026, new advanced models have aggressively found significant security issues across vast swaths of open source projects, with critical impact. (Obviously, the same likely applies to closed-source as well, we just don't have as much insight into that, but be sure it happens.) Open Source doesn't have confidentiality issues (see below) and can always benefit from more (useful) hands and eyes, though misguided application of LLM tools in the hands of well-meaning or greedy bug-bounty hunters has been a torment of many open source maintainers. Even poor Linus Torvalds is starting to look like a paradigm of patience in the face of the LLM bug bounty tsunami.

When Should AI/LLM Tools NOT Be Used for Coding?

A major litmus test against using AI/LLM coding tools is of course, confidentiality. Many situations will prohibit exposing intellectual property like proprietary code to a cloud-based commercial service. It is a common concern whether your interactions with LLMs are being used to further train those LLMs, and the answer is not at all clear and reassuring. Even if the policy now says "we won't" do such a thing, history has shown that companies in desperate situations or insolvency will make internal data available to liquidation that their prior customers never dreamed wouldn't be confidential ( https://jmacweb.com/ai-news/ai-poop-analysis-app-offered-to-sell-me-database-of-its-users-poops-20260517 ) . The existence of LLM cloud services that are foreign-owned or have murky allegiance (such as Tencent/Hunyuan and Alibaba/Qwen) will give any security officer a strong pause and leave them searching for the security department bottle of whiskey.

A large number of our customers cannot have any of their proprietary or legally restricted (eg CUI, Controlled Unclassified Information or even stronger restrictions) data present in any uncertified cloud environment, much less an AI datacenter of unknown location where it co-exists with data from potentially anyone around the world. A significant category of dramatic information leaks has already occurred at AI tool providers, where internal mistakes or deliberate prompt injection has revealed stored data from other user sessions. This would be a catastrophic event with many codebases.

How Can One Mitigate Situations Where SaaS AI Coding Tools Shouldn’t Be Used?

A mitigating factor for this obstacle is the use of self-hosted LLM models instead of cloud-hosted SaaS models. The best flagship cloud models like Claude, Gemini and ChatGPT all run on hardware with impractically massively-large VRAM quantities and are not available for download and local operation. There do exist, however, a number of downloadable models scaled, pruned and optimized for practical local execution on self-hosted hardware of 96GB of VRAM or even less. Non-NVIDIA hardware using unified memory models from vendors like AMD and Apple offers total-system RAM capacity of 512GB, which can be utilized by the LLM model. Even though these might not be quite as breathtakingly fast as the newest and largest NVIDIA chips, they can still perform better in the long run simply because you can afford to run a large model that is out of reach for you in the NVIDIA vendor space. The best locally hostable coding LLMs (Deepseek, Kimi, MiMo, GLM) still require 500GB-2TB of VRAM in their "smartest" model size, but our tests have demonstrated that reasonable coding performance can be achieved with smaller models, quantized and precision-tuned, such as Qwen3-Coder-30B-A3B (48GB GPUs) and Qwen3-Coder-Next 80B (dual 48GB GPU configuration). These models lack some of the macro-scale planning expertise of the giant models, but can be used via router to perform agentic coding and engineering tasks as planned and directed by a human or larger LLM (potentially also self-hosted). In this way, agentic coding scaling is practical and available for codebases that have restrictive IP concerns.

Such self-hosted LLM instances have the advantage of completely controlling the residency of your proprietary and restricted data. These models can run on a computer completely disconnected and air gapped from the Internet, and you can ensure all traces of the sensitive data are purged prior to reconnecting. As well, self-hosted models don't charge you per-token. You pay for the hardware up front, and the power to run it, but within those constraints, you can have as many tokens as you want for as long as you want. This can be very attractive for tasks that might need extensive churning, but don't need the advanced thinking of a flagship model that is billed as an SaaS/API. While there is potential for "foreign bias" in models like Qwen (it won't talk about things that are censored by the Chinese Government), running them in a restricted software environment allows us to ensure it doesn't try to exfiltrate any proprietary data outside its sandbox. It is an interesting thought exercise to consider whether any model might have hidden directives that could sabotage adversarial targets in an air-gapped environment, similar to how STUXNET ( https://spectrum.ieee.org/the-real-story-of-stuxnet ) did its malice without a connection to the outer world.

Can Mixed/Hybrid LLM Model Arrangements Be Used to Improve RoI?

In testing local self-hosted AI tools, we've experimented with hybrid mixed models. One particular combination that we had success with (on non-restricted code) was Claude Opus and Qwen Coder. Using a local LLM harness and routers, we directed the Claude Code tool to utilize Opus for planning and strategic tasks and delegate actual implementation and testing work to Qwen. Qwen is quite compliant at following instructions from Claude Opus -- it has no real knowledge that the directives came from a different (and smarter) model than itself. This allows us to maximize our RoI by employing the low-volume cost-restricted high-thinking abilities of the Claude Opus model running on a massive cloud SaaS architecture (at Anthropic) and the high-volume cost-effective hardware-paid-for token production of our local GPU systems. Local GPUs typically aren't going to be as fast as the massive cloud datacenters in terms of token production per second, but it doesn't matter. If the tokens are "free" (after the hardware investment), it can run autonomously all day and night and the end value is still astonishingly high.

So, “What’s the frequency Kenneth?”

So, the final take-away is this. AI/LLM agentic software development is clearly here, and unlikely to go away. In fact, it just keeps getting better, and ignoring this reality is like doubling down on buggy-whip manufacturing. AlphaPixel DOES use AI/LLM/agentic coding tools on a daily basis, for research, tool development, and in a very limited case, for production coding. But it's always under careful guidance and guardrails, informed consent and careful review. We recommend the very same practices for others utilizing this technology.

AI Declaration

While this blog post was reviewed by, and editing suggestions made by AI tools, it was all handwritten by a real organic human, with all the issues caused therein.

How AlphaPixel Uses (And Doesn’t Use) AI In Development
Scroll to top

connect

Have a difficult problem that needs solving? Talk to us! Fill out the information below and we'll call you as soon as possible.

Diagram of satellite communications around the Earth
Skip to content