Introducing Dropstone 1.5: Frontier Coding Intelligence, For Everyone

The most capable AI coding tools today cost $200 a month, with rate limits. I don't think frontier coding intelligence should be a luxury good. It should be infrastructure. Available to every developer who writes code for a living. Today we're announcing Dropstone 1.5. A coding agent that outperforms Claude Opus 4.7, GPT-5.5 Pro, and Gemini 3.1 Pro on real-world GitHub issues, runs exclusively on US-hosted infrastructure, and supports a full-time coding workload on every plan. Not just the top tier. This post walks through what we built, how it benchmarks, what it costs, and the soft-launch numbers from the first 12 hours. Over the last year the same headlines repeated almost monthly. AI labs raise prices. Features move behind enterprise tiers. Existing plans get throttled. Power users on Claude Code Max at $200 a month report hitting rate limits inside three hours. Cursor users watch usage allowances shrink while monthly costs creep upward. GitHub Copilot tightens its enterprise gate. Meanwhile, in the parallel universe of open-weight foundation models, the gap that was supposed to be permanent kept closing. DeepSeek V4 Pro now matches Claude Opus 4.7 on most agentic coding benchmarks. Moonshot Kimi K2.6 leads on tool use and long-context retrieval. These are not toy models. They are frontier-class systems with public weights, public methodology, and full reproducibility. The two trends point in opposite directions. Closed frontier coding gets more expensive and more gated. Open-weight frontier coding gets more capable and more accessible. Almost no one is connecting them. That is the gap Dropstone fills. Dropstone is a coding agent that runs on whichever open-weight frontier model is best each month, hosted exclusively on US infrastructure, with a runtime that does not trust the model. Three pieces. Each one matters. Best open-weight frontier model, each month. Dropstone Pro 1.5 and Dropstone Fast 1.5 currently run on DeepSeek V4 Pro and DeepSeek V4 Flash. Dropstone Heavy 1.5 runs on Moonshot Kimi K2.6. Dropstone Vision 1.5 runs on Gemini 3.5 Flash for vision passes. We re-evaluate the open-weight frontier every month and swap to whatever is best on the benchmarks we publish. You do not get locked into one lab's release cadence. US-hosted, with zero data retention. Every inference call routes through US-based inference providers. The data_collection deny flag is enforced at every API call. Your prompts and code never leave US data centers, and they are never persisted by Dropstone or by the provider. The model weights run with no persistent state and no outbound network. Runtime that does not trust the model. Every tool call (file edit, shell command, web fetch) requires your explicit approval. Output is text until you authorize action. Even a fully compromised model cannot harm a Dropstone user without first defeating the user's own approval gate. This is the security boundary, and it lives in the runtime, not the weights. On SWE-bench Verified, the industry-standard benchmark for solving real GitHub issues from production repositories, Dropstone Pro 1.5 resolves 91.2% of issues. Claude Opus 4.7 resolves 87.6%. GPT-5.5 Pro resolves 85.0%. Gemini 3.1 Pro resolves 80.6%. The ground truth for each task is the maintainer's actual merged code. Cross-validated by Blankline Research with full methodology published at joule.blankline.org. Across the standard frontier-model coding battery, Pro 1.5 leads on coding accuracy (HumanEval 96.4%), coding completion (MBPP+ 94.0%), code reasoning (LiveCodeBench 93.5), competitive programming (Codeforces Elo 3,206), algorithmic coding (Apex Shortlist Pass at 1 90.2), complex coding (BigCodeBench-Hard 71.0%), and agentic coding (SWE-bench Verified 91.2%). Claude Opus 4.7 edges ahead on SWE-bench Pro at 64.3% versus Pro 1.5's 62.0%. We are not going to hide that. It is one benchmark in eight, and we will close the gap in the next release cycle. Heavy 1.5 is built on Moonshot Kimi K2.6 and is the tier designed for agent swarms, multimodal coding, and long-context retrieval. It leads on multilingual coding (MultiSWE-bench 78.0%), agent swarms (MAS-Bench v2 76.5%), tool use (BFCL v2 90.1%), and long-context retrieval (LongBench-v3 128K, 88.0%). Fast 1.5 competes against Claude Haiku 4.5, GPT-5 Mini, and Gemini 3 Flash. It leads on coding accuracy, coding completion, complex coding, repo-level coding, multilingual coding, and cross-file completion. Gemini 3 Flash leads on agentic coding and reasoning. Fast 1.5 is the default tier on the Pro plan because it gives the best dollars-per-task ratio of any small model on the market today. Dropstone Vision 1.5 reaches 83.6% on MMMU-Pro, approaching the best-human-expert ceiling of 88.6%. It captions images attached through the CLI with SHA256 caching for sub-millisecond repeat lookups, so screenshots, diagrams, and charts become first-class context for any coding turn. Raw benchmark accuracy only tells part of the story. A model that scores 95% but costs $1 per task and burns 500 joules of compute is not a better choice than a model that scores 88% at $0.08 per task. The Joule Index is Blankline's audit-grade composite benchmark. It combines three signals: dollars per task, joules per task, and Attention F1 (a measure of how closely the agent's output matches the maintainer's actual merged code). Every score is reproducible from public traces. On the Joule Index, Dropstone Fast 1.5 ranks number one overall with a composite score of 0.883. Claude Haiku 4.5 is second at 0.825. Dropstone Pro 1.5 is third at 0.778. Claude Opus 4.7 is fourth at 0.703. Fast 1.5 achieves this because it costs $0.082 per task (12.6x cheaper than Claude Opus 4.7), uses 224 joules per task, and matches the maintainer's code with a perfect Attention F1 of 1.000. Full methodology and traces are public at joule.blankline.org. If you want to reproduce the numbers, you can. The composite score balances dollars per task, joules per task, and Attention F1 into a single value. Dropstone Fast 1.5 leads at 0.883, with Dropstone Pro 1.5 at 0.778. The full ranking includes Claude Haiku 4.5, Claude Opus 4.7, Gemini 3.1 Flash, Claude Sonnet 4.6, Gemini 3.1 Pro, and Dropstone Heavy 1.5. At the entry tier, Dropstone supports 450 weekly heavy-coding turns at $15 a month. That is twice the usage of Claude Code Pro ($20, 225 turns), four times the average of every frontier coding CLI, and enough to support a full-time coding workload without hitting a wall. For context, the math: one heavy-coding turn averages 15,000 input tokens and 800 output tokens. That is one full back-and-forth with the agent including repository context, tool calls, and the agent's response. 450 turns per week works out to roughly 90 turns per day across a five-day work week, or one substantive coding interaction every five to six minutes during an eight-hour day. At the top tier, Dropstone Max delivers 2,700 weekly heavy-coding turns at $75 a month. That is nearly 3x ChatGPT Codex Pro (1,000 turns at $100), 6x Claude Code Max 5x (450 turns at $100), 6x Grok Build (500 turns at $99), and 6x Antigravity (450 turns at $100). At $75 versus $100, that is $25 less than every comparable Max plan, with two and a half to six times the usage allowance. Cursor does not offer a comparable single-seat tier at $75 to $100 a month. Their entry plan starts at $20 with 52 weekly turns, and their business tier jumps to $40 per seat. 12 hours ago we opened Dropstone 1.5 quietly. No homepage announcement. No press outreach. No social posts. Just a quiet open of the gate to whatever developers happened to find us first. In those 12 hours, developers ran 13,945,820 tokens through Dropstone. That is 1.16 million tokens per hour. 19,400 per minute. 323 per second. The growth curve is still accelerating. For organic usage with zero marketing on the launch day of a developer coding tool, this is the strongest opening we can find on record. Cursor took roughly three to four months to reach meaningful organic traction after launch, with YC backing and press coverage from day one. Replit had years of slow community building before any viral moment. Devin's launch was viral but demo-driven, with no real product shipped to real users on day one. Dropstone has real usage from real developers before a single press article has been written about it. That is the metric that matters for a soft launch, and we cannot find a stronger result for a developer tool. The one honest caveat: tokens processed is not the same as users or revenue. We do not yet know whether this is 50 power users hammering Dropstone or 5,000 casual users trying it once. The shape of the growth curve suggests the latter, but that is still an inference rather than a measurement. We will publish the breakdown in two weeks once the signal is clean. Some readers will ask: isn't Dropstone Pro built on DeepSeek? Isn't DeepSeek Chinese? Yes and yes. We will not pretend otherwise. Dropstone Pro 1.5 and Fast 1.5 are built on DeepSeek V4 Pro and V4 Flash. Dropstone Heavy 1.5 is built on Moonshot Kimi K2.6. These are open-weight foundation models trained by Chinese labs. Their training process, training data, and any embedded behaviors are not auditable. This is not a Dropstone limitation. Goldwasser et al. (2022) proved no party can prove a closed foundation model is free of embedded behaviors. That applies to Anthropic with Claude. To OpenAI with GPT. To Google with Gemini. To every closed foundation model on the market. The difference with Dropstone is that we say this out loud. Most vendors imply the problem is solved when it provably is not. The technical answer to is this safe lives in the runtime, not the weights. Sandboxed inference: every call routes through US-hosted providers with data collection denied at the API layer, with no persistent state and no outbound network from the model itself. Approval gate on every action: file edits, shell commands, and network calls all require explicit user approval. No model output is ever auto-executed. Open methodology: all benchmarks and traces are public at joule.blankline.org. For regulated environments where model provenance restrictions apply, a US-Origin Weights enterprise tier is available on request at enterprise at blankline dot org, built on US-trained open models. Dropstone 1.5 is the first release in a family that will iterate monthly. The model behind each tier will swap when something better arrives on the open-weight frontier. The pricing structure will not change. The runtime guarantee will not change. The published benchmarks will not move behind a paywall. Our commitment is straightforward: every dollar of cost reduction we can wring out of the open-weight stack, you keep. Every benchmark improvement, you get on the same plan. Every infrastructure investment we make, it stays US-hosted. If you build software for a living, frontier coding intelligence should not be locked behind a $200 monthly plan. Today it is not. Dropstone 1.5 is live. Try it today at dropstone.io. Full benchmarks, methodology, and traces are published at joule.blankline.org. For enterprise inquiries: enterprise at blankline dot org. We will publish a follow-up post in two weeks with the full breakdown of soft-launch usage data (users, repeat sessions, token distribution by tier) once the signal is clean enough to share honestly.