<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Tosin Dairo]]></title><description><![CDATA[Dreaming and Engineering the Future!]]></description><link>https://blog.tosindairo.com</link><image><url>https://substackcdn.com/image/fetch/$s_!B82b!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62847603-5364-4323-afdd-3d49fb821266_1477x2048.jpeg</url><title>Tosin Dairo</title><link>https://blog.tosindairo.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 15 May 2026 11:11:32 GMT</lastBuildDate><atom:link href="https://blog.tosindairo.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Tosin Dairo]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[tosindairo@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[tosindairo@substack.com]]></itunes:email><itunes:name><![CDATA[Tosin Dairo]]></itunes:name></itunes:owner><itunes:author><![CDATA[Tosin Dairo]]></itunes:author><googleplay:owner><![CDATA[tosindairo@substack.com]]></googleplay:owner><googleplay:email><![CDATA[tosindairo@substack.com]]></googleplay:email><googleplay:author><![CDATA[Tosin Dairo]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[High Urgency, High Bar!]]></title><description><![CDATA[Notes from the inference trenches]]></description><link>https://blog.tosindairo.com/p/high-urgency-high-bar</link><guid isPermaLink="false">https://blog.tosindairo.com/p/high-urgency-high-bar</guid><dc:creator><![CDATA[Tosin Dairo]]></dc:creator><pubDate>Tue, 02 Sep 2025 06:22:19 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/172501042/097e46077ee00558f824b61b792fb807.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>I have been building in the AI domain for the last 7 years, and for the last two and a half I have been knee deep in generative AI. This is my first public blog like write&#8209;up in a while and I have decided to try something new with monologues. This blog is a more concise format of the above monologue speaking on where the field is racing towards, and where it&#8217;s quietly stalling on.</p><h3>The pain I keep running into</h3><p>Let me walk you through a typical day using AI powered workflows; I move between agentic coding tools and chat models. Run research in one system, simulate and prototype in another, and review in a third. All of a sudden I find myself doing tons of copy/paste every single time.</p><p>Most agentic tools or chat apps have added memory toggles and longer<strong> </strong>context<strong> </strong>windows, but the moment I cross product boundaries, my work evaporates. There is no real portability of context. The result is a fragile workflow stitched together by the clipboard. That&#8217;s not intelligence; that is an unconscious adaptive behaviour.</p><h3>The inference tax</h3><p>Let&#8217;s be honest: we are being oversold at the app layer. To get the best of each world, I&#8217;m nudged into multiple subscriptions and stacked usage fees. Meanwhile we hear that token costs are trending down to 0 but invoices don&#8217;t reflect that reality. There is a lot of profiteering around inference, while the fundamentals that would bend the cost curve architecture, democratisation, engineering efficacy aren&#8217;t being pushed hard enough on. The big winners in that gap are the large labs and hyperscalers.</p><h3>Progress without memory isn&#8217;t progress</h3><p>We <em>have</em> made real advances such as vector databases, graph database, better retrieval, better models, and clever cache mechanisms. But continual learning across workflows is still mostly theatre. Even with in-app memory, I can&#8217;t carry today&#8217;s research from ChatGPT to an agentic coding tool to a different workspace and expect it to <em>just work</em>. I fall back to screenshots, pasted notes, and start over here. It&#8217;s productivity cosplay.</p><h3>High urgency, high bar</h3><p>If there is a theme to what I&#8217;m arguing for, it&#8217;s this:</p><ul><li><p><strong>High urgency</strong>: Treat inference affordability and context portability like P0 bugs for the entire ecosystem. They block real adoption and compound hidden costs.</p></li><li><p><strong>High bar</strong>: Measure progress by fundamentals, not vibes. Prioritise <em>interoperability, learning continuity, engineering efficiency,</em> and <em>customer&#8209;validated outcomes</em>. Don&#8217;t declare victory without these standards.</p></li></ul><h3>What I want to see (and help build)</h3><ol><li><p><strong>Context portability by default</strong><br>Your research, notes, and working state should be addressable and importable across providers securely, with your consent without manual glue. Memory must be a <em>user primitive</em>, not a product lock&#8209;in lever.</p></li><li><p><strong>Real continual learning for workflows</strong><br>Beyond long prompts: models that <em>retain</em> and <em>refine</em> task&#8209;specific knowledge across sessions and tools, with auditability and controls. This is the difference between good chat and compounding value. Pre-fill caching, ICL with verifiable feedbacks and Adapter tuning should be complimentary for this while we keep researching for more algorithmic breakthroughs</p></li><li><p><strong>Inference economics that reflect reality</strong><br>Push on architecture, scheduling, and caching strategies that lower effective cost. </p></li><li><p><strong>Engineering efficacy over vibe coding</strong><br>Natural language to code is powerful, but when it becomes <em>vibe coding</em>, we lose rigor. Ship systems that shorten the path to <em>reliable</em> software tests, tracing, reproducibility not just generated programs without engineering fundamentals.</p></li><li><p><strong>Research &#8594; product loops everywhere</strong><br>One thing we&#8217;ve nailed is compressing the cycle from paper to product. But we need to double down on putting products in users&#8217; hands faster, with enough feedback rails in the product as an instrument for truth, then feed that back into model and system design. Let customers set the scoreboard.</p></li></ol><h3>Why the bar matters</h3><p>Most times we find ourselves optimising for local maxima: one model, one app, one neat demo. But the real value shows up only when intelligence compounds across <em>systems</em>. If I can hand work off between tools without losing context and if the cost to do that is sane then AI stops being a novelty and starts behaving like infrastructure.</p><p>Until then, we&#8217;ll keep paying the inference tax while living in a copy/paste culture that pretends to be memory.</p><p>I&#8217;m optimistic because the path isn&#8217;t mystical, it is engineering. Set a higher bar. Move faster on the parts that actually unlock compounding value: context portability, continual learning, and honest economics. That&#8217;s the urgent work.</p><p><em>Look out for my next monologue: why every piece of software should come with a reinforcement&#8209;learning environment. <strong>The age of experience means products get to improve with us</strong>.</em></p>]]></content:encoded></item></channel></rss>