发现 Vibe Coding 新世界

探索VIBE CODING的无限可能

发现最新的 AI 辅助开发资讯、最实用的开发工具和最前沿的技术趋势。HappyVibe 是为开发者打造的一站式信息聚合平台。

今日 AI 日报已更新，点击查看 →

探索工具库浏览最新资讯

向下滚动

最新资讯

Vibe Coding 领域的最新动态

查看全部 →

dev.to•3月12日

What is a Reverse Proxy and How Does It Work?

A reverse proxy sits in front of your backend servers and becomes the main point where all incoming requests arrive. Users never connect to your application servers directly because the reverse proxy handles the request first. It decides which backend should process it, whether anything needs to be filtered or modified, and whether the request should reach the application at all. Before diving into how reverse proxies work, it helps to understand what a proxy is and how the reverse version differs from it. A forward proxy sits between the client and the internet and makes requests on behalf of the user. Instead of the client connecting directly to a website or API, the forward proxy handles the communication. Here’s what happens when you browse through a forward proxy: Your request goes to the proxy first The proxy forwards the request to the destination The website only sees the proxy’s IP, not yours You gain anonymity or bypass access restrictions Forward proxies are common in corporate networks to control and monitor employee browsing. Individuals also use them to hide their identity or access region-blocked content. VPNs take this idea further with encryption and routing, but the core concept is the same. A reverse proxy flips the idea entirely. Instead of acting on behalf of the client, it acts on behalf of the server. A reverse proxy sits in front of your backend servers, and every incoming request goes to it first. The backend servers are never exposed directly to users. It handles tasks like: Deciding which backend server should process a request Filtering, blocking, or modifying requests Handling caching and SSL Protecting internal architecture details To make it clear: Forward proxy = hides the client Reverse proxy = hides the server Reverse proxies exist because a single application server cannot handle everything on its own. Modern websites and APIs need: Faster responses Stronger security Protection against attacks Support for millions of users Zero-downtime scaling Global availability A reverse proxy becomes the control center that helps achieve all of this. Let’s break down the major use cases. A reverse proxy sits between the client and your backend servers. It receives every incoming request first, then decides which server should handle it. This lets you spread the load instead of dumping everything on one machine. Here’s the flow: Client → Reverse Proxy → Backend Server A/B/C The reverse proxy uses load balancing strategies to determine which server should handle each request. Common strategies include: Round Robin – sends requests to each server in turn Least Connections – sends requests to the server currently handling the fewest connections By evaluating the current load on each server, the proxy ensures requests are distributed efficiently, keeping your system fast, stable, and reliable even during traffic spikes. Directly exposing your backend servers to the internet is risky. A reverse proxy acts as a protective layer, sitting between clients and your servers, and handling all incoming traffic first. Here’s how it helps: Hides backend IP addresses – clients and attackers never see your real server IPs Blocks malicious traffic – filters requests with suspicious patterns Stops bots and scrapers – prevents automated attacks or unwanted crawlers Mitigates DDoS attacks – absorbs traffic spikes before they reach the backend Enforces authentication and rate limits – ensures only valid users or requests get through Adds or modifies security headers – strengthens HTTP response security Terminates SSL/TLS connections safely – centralizes encryption and reduces load on backend servers When an attack occurs, it hits the reverse proxy first, keeping your application servers safe and isolated from direct exposure. A reverse proxy can store cached copies of pages, images, and even API responses. When another user requests the same content, the proxy serves it immediately, without contacting the backend server. How this helps: Faster responses – pages and APIs load more quickly Lower backend load – reduces database and CPU usage Handles traffic spikes – multiple users can get content from the cache simultaneously Reduces infrastructure costs – fewer requests reach your servers Content Delivery Networks (CDNs) work on this principle. They are essentially large, distributed reverse proxies that deliver cached content globally, ensuring high speed and reliability for users everywhere. In a microservices architecture, different endpoints are handled by separate services. A reverse proxy acts as a central gateway, routing requests to the correct service automatically. Example routing: /auth → Auth service /users → User service /payments → Payment service This allows clients to interact with a single entry point, while the proxy distributes requests to the appropriate backend services. Even if your system is split across dozens of microservices, the reverse proxy keeps the architecture unified and manageable. Popular Reverse Proxy Tools and Servers These tools sit between users and your backend infrastructure, handling routing, security, caching, load balancing, and traffic management. Each one has different strengths depending on whether you need high performance, easy configuration, horizontal scaling, or full-featured edge security. 1. NGINX A high performance reverse proxy known for speed, low memory usage, and flexible configuration. Commonly used for TLS termination, load balancing, caching, and API gateway patterns. 2. HAProxy One of the most reliable load balancers on the planet. Extremely strong at handling huge traffic volumes, health checks, connection pooling, and advanced routing logic. 3. Traefik A modern reverse proxy built for microservices and containers. Auto-discovers services from Docker, Kubernetes, Nomad, and Consul which makes setup extremely simple for dynamic environments. 4. Apache HTTP Server Used in older or enterprise systems where Apache is already installed. Offers reverse proxy capabilities via modules like mod_proxy, mod_ssl, and mod_cache. 5. Cloudflare (Reverse Proxy at the Edge) Works as a global reverse proxy in front of your entire application. Provides DDoS protection, caching, WAF rules, bot mitigation, and global routing without touching your servers. A reverse proxy is a critical component of modern web architecture. Sitting in front of your backend servers, it enhances performance, security, scalability, and reliability. Almost every major website and API relies on a reverse proxy, even if users never see it. Understanding how reverse proxies work gives insight into how large-scale platforms stay fast, secure, and highly available, even under massive traffic and complex architectures.

Microservices Backend Performance

dev.to•3月12日

I Built an MCP Server That Lets AI Agents Debug Running Ruby Processes

girb-mcp is an MCP server that gives LLM agents access to running Ruby processes. / girb-mcp girb-mcp 日本語版 (Japanese) MCP (Model Context Protocol) server that gives LLM agents access to the runtime context of executing Ruby processes. LLM agents can connect to a paused Ruby process, inspect variables, evaluate code, set breakpoints, and control execution — all through MCP tool calls. What it does Existing Ruby/Rails MCP servers only provide static analysis or application-level APIs. girb-mcp goes further: it connects to running Ruby processes via the debug gem and exposes their runtime state to LLM agents. Agent → connect(host: "localhost", port: 12345) Agent → get_context() → local variables, instance variables, call stack Agent → evaluate_code(code: "user.valid?") → false Agent → evaluate_code(code: "user.errors.full_messages") → ["Email can't be blank"] Agent → continue_execution() Installation gem "girb-mcp" Or install directly: gem install girb-mcp Requires Ruby >= 3.2.0. Quick Start 1. Start a Ruby process with the debugger # Script rdbg --open --port=12345 my_script.rb # Or with environment variables RUBY_DEBUG_OPEN=true RUBY_DEBUG_PORT=12345 … View on GitHub It works with any client that supports MCP (Model Context Protocol). Tested with Claude Code and Gemini CLI. For example, you can simply tell the agent "investigate this bug," and it will actually send requests, inspect runtime state, and identify the root cause: You: The users list page is returning a 500 error. Connect to the debug session and find out why. Agent: I set a breakpoint in the controller and sent a request. After inspecting variables at the stop point, I found a record with a nil name in @users (User ID: 42). The view calls user.name.uppercase, which raises a NoMethodError at that point. The key here is the ability to see what's actually happening at runtime — something you can't get just by reading code. Check out this video to see it in action: How girb-mcp Differs from girb girb, which I released recently, is a tool for humans to interactively call AI from within IRB or the Rails console. / girb girb (Generative IRB) An AI assistant for Ruby development. Works with IRB, Rails console, and the debug gem. 日本語版 README Features Context Awareness: Understands local variables, instance variables, and runtime state Tool Execution: AI autonomously executes code, inspects objects, and reads files Autonomous Investigation: AI loops through investigate-execute-analyze cycles Multi-environment Support: Works with IRB, Rails console, and debug gem (rdbg) Provider Agnostic: Use any LLM (OpenAI, Anthropic, Gemini, Ollama, etc.) Quick Start # 1. Install gem install girb girb-ruby_llm # 2. Set your API key export GEMINI_API_KEY="your-api-key" # or OPENAI_API_KEY, ANTHROPIC_API_KEY # 3. Create ~/.girbrc ```ruby require 'girb-ruby_llm' Girb.configure do |c| c.provider = Girb::Providers::RubyLlm.new(model: 'gemini-2.5-flash') end 4. Run girb Then type a question and press **Ctrl+Space**, or use `qq <question>` ## Table of Contents 1. [Configuration](#1-configuration) - Common setup for all environments … View on GitHub girb-mcp takes the same approach — "accessing the context of a running Ruby process" — and makes it available to LLM agents. girb girb-mcp Who uses it Humans (interactive in IRB) LLM agents (via MCP) Interface Commands in the REPL MCP tool calls How to run girb or binding.girb Add to MCP client config If girb is a tool for "humans debugging with AI assistance," then girb-mcp is a tool for "AI debugging autonomously." There are already several MCP servers for Ruby/Rails, but they mainly focus on static analysis and application-level APIs (DB queries, route inspection, etc.). girb-mcp connects to a running Ruby process via the debug gem and exposes its runtime state to the agent. Agent → connect(host: "localhost", port: 12345) Agent → get_context() → local variables, instance variables, call stack Agent → evaluate_code(code: "user.valid?") → false Agent → evaluate_code(code: "user.errors.full_messages") → ["Email can't be blank"] Agent → continue_execution() The decisive difference from static analysis is the ability to actually evaluate and return things like "what value does this variable hold right now?" or "what's the result of user.valid??" In a dynamic language like Ruby, there are many bugs you can't figure out just by reading code, so I believe this approach is particularly effective. The debug gem (rdbg --open) exposes a socket on the target Ruby process girb-mcp connects to that socket using the debug gem's protocol Tool calls from the MCP client are translated into debugger commands, and the results are returned MCP is an open standard developed by Anthropic — a protocol for connecting LLMs to external tools. girb-mcp uses the mcp gem to comply with this specification, so it works with any MCP-compatible client. Tool Description evaluate_code Execute Ruby code in the stopped binding inspect_object Get an object's class, value, and instance variables get_context Get local variables, instance variables, call stack, and breakpoints all at once get_source Get the source code of a method or class Tool Description set_breakpoint Set breakpoints by line, method, or exception class continue_execution Resume execution until the next breakpoint or termination step / next / finish Step in / step over / run until method returns Rails-specific tools are automatically added when a Rails process is detected. Tool Description rails_info Show app name, Rails version, environment, and DB info rails_routes Show routes with filtering rails_model Show a model's columns, associations, validations, and enums trigger_request Send an HTTP request to the Rails app being debugged trigger_request automatically disables CSRF protection temporarily for POST and other requests, so you can send requests without worrying about tokens. When you ask AI to implement something, it writes tests too. And the tests pass. But are those tests actually correct? Tests written by AI only verify "the spec as the AI understood it." They don't necessarily verify the behavior the user intended. The tests pass, but when you actually run the app, it doesn't work the way you expected. I think this is a common experience in AI coding. With girb-mcp, you can go one step further beyond tests: AI writes the implementation AI writes and passes the tests Use girb-mcp to actually run the app and verify it behaves as intended For example, say you asked AI to implement "only admins can delete articles." After the tests pass, you can actually send a DELETE request and confirm that a regular user gets a 403, and an admin successfully deletes the article — verified through actual behavior. Where before the story ended with "tests pass, now a human needs to manually verify," you can now delegate verification to the AI as well. Humans just need to look at the final working result and make a judgment. Install the gem: gem install girb-mcp Add girb-mcp to your MCP client's configuration. For Claude Code (~/.claude/settings.json): { "mcpServers": { "girb-mcp": { "command": "girb-mcp", "args": [] } } } For Gemini CLI (~/.gemini/settings.json): { "mcpServers": { "girb-mcp": { "command": "girb-mcp", "args": [] } } } Any MCP client that supports STDIO transport can use a similar configuration. command to bundle and args to ["exec", "girb-mcp"]. Requires Ruby >= 3.2.0. rdbg --open --port=12345 my_script.rb Then just ask the agent: "Connect to the debug session and show me the current state." There's also a run_script tool, so you can let the agent handle launching the Ruby script itself. girb-mcp comes with a command to start a Rails server in debug mode: girb-rails # Equivalent to RUBY_DEBUG_OPEN=true bin/rails server Tell the agent: "Set a breakpoint on line 15 of app/controllers/users_controller.rb and send a GET request to /users/1." The agent will automatically handle the entire flow: setting the breakpoint → sending the request → inspecting variables at the stop point. You can connect to Ruby processes inside Docker via TCP or Unix socket volume mounts. When connecting via TCP, you can browse and read files inside the container even without having the source code locally. A few things to keep in mind: evaluate_code can execute arbitrary Ruby code. However, dangerous operations like file manipulation and system commands are restricted by the LLM agent's policies. girb-mcp is simply a "window to the debugger" and is designed to be used in combination with the agent's guardrails. The debug gem has no authentication. When exposing a debug port via TCP, bind to 127.0.0.1 or otherwise restrict access. Do not use in production. This is a tool for development and debugging purposes only. girb-mcp is part of the girb family: girb — AI-powered IRB assistant (interactive, for humans) girb-mcp — MCP server for LLM agents (programmatic, for agents) girb-ruby_llm — LLM provider via ruby_llm girb-gemini — LLM provider via Gemini API girb-mcp is still a work in progress. If you try it out and notice anything, please let me know! GitHub Issues: https://github.com/rira100000000/girb-mcp/issues Any feedback is welcome — whether it's "this part is hard to use" or "I'd love to see this feature"!

mcp Ruby AI

dev.to•3月12日

2026年版！AIエージェント開発に必須のオープンソースGitHubリポジトリ10選

2024年まで、AIの学習といえば「モデルのファインチューニング」が主流でした。でも2026年の今、完全にゲームが変わりました。今のAI開発の主戦場は「モデルの精度を0.1%上げること」ではなく、「AIにシステム権限を与えて、自律的にタスクを実行させるアーキテクチャをどう設計するか」に移っています。つまり、チャットボットからデジタルワーカーへの進化です。この1年で、AIを取り巻く状況は劇的に変わりました。何が起きているのか、ポイントを整理します。 AIはもう「チャットボックスで質問に答えるだけ」の存在ではありません。ブラウザを操作し、コードを書き、バグを自分で修正する——いわば「デジタル従業員」として動く時代に突入しています。 OpenClawのようなプロジェクトでは、AIがシェルレベルのアクセス権を持ちます。つまり、「完璧なプロンプトを書くスキル」より、安全な実行ロジックを設計するスキルの方がはるかに重要になってきたんです。どんなに優秀なモデルを使っても、メモリ管理、プランニング、ツール連携のクローズドループがうまく設計されていなければ、まともに動きません。システムアーキテクチャこそが、AIアプリの実力を決める時代です。ここからは、実際に触ってみて「これは本物だ」と感じたリポジトリを、学習ステップ別に紹介します。 Microsoftが公開している12週間のAIカリキュラムです。シンボリックAIからニューラルネットワークまで、体系的に学べます。「AIを基礎からちゃんと理解したい」という人には、まずこれを薦めます。 🔗 GitHub: microsoft/AI-For-Beginners こちらもMicrosoft製。機械学習の「百科事典」的な存在です。AIが実際にどうやって意思決定しているのかを理解するには、ここを押さえておくのが王道ですね。 🔗 GitHub: microsoft/ML-For-Beginners 2026年のプロンプトエンジニアリングは、もう「うまい質問文を書く」レベルの話ではありません。Chain-of-Thought（CoT）やReActフレームワークを組み合わせたシステマティックなエンジニアリングです。このリポジトリはその全体像を掴むのに最適です。 🔗 GitHub: dair-ai/Prompt-Engineering-Guide ファインチューニング、量子化（Quantization）、RAGデプロイまでカバーするフルスタックなロードマップです。「LLMを使ったサービスを実際にプロダクションに載せたい」人には、このコースが一番実践的だと思います。 🔗 GitHub: mlabonne/llm-course Hugging Face公式のエージェント開発ガイドです。外部ツールを呼び出してタスクを完了する、実用的なエージェントの構築方法を体系的に学べます。「エージェントって結局何？」という人にもおすすめ。 🔗 GitHub: huggingface/agents-course 2026年で最も注目されている軽量エージェントフレームワークです。最小限のコードで効率的なエージェントロジックを書けるのが特徴。「重厚なフレームワークはちょっと…」という人にドンピシャです。 🔗 GitHub: huggingface/smolagents 2026年の大本命。ブラウザやシステム環境を自律的に操作できる、まさに「デジタルワーカー」フレームワークです。フィードバックベースの自己反復モジュール OpenClaw-RL も要チェック。セキュリティ注意：このプロジェクトは高いシステム権限を必要とするため、必ずDockerコンテナかサンドボックス環境で実行してください。 🔗 GitHub: openclaw/openclaw AIペアプログラミングツールの決定版です。ローカルのGitリポジトリ内で、AIが複数ファイルにまたがってコードを編集してくれます。「システムに統合されたエージェント」の好例として、実際の開発ワークフローにすぐ取り入れられるのが魅力ですね。 🔗 GitHub: Aider-AI/aider ClaudeのMCP（Model Context Protocol）機能を、実際の開発ワークフローにどう組み込むかを深掘りしたリポジトリです。AI駆動の開発スタイルを確立したいなら必読。 🔗 GitHub: shanraisshan/claude-code-best-practice 医療画像解析から自律型ゲームエージェントまで、実際のプロダクションコードが大量にまとまっています。「理論はもういいから、動くコードを見せてくれ」という実践派の開発者には宝の山です。 🔗 GitHub: shubhamsaboo/awesome-llm-apps 「10個もあって迷う…」という方に、おすすめルートをまとめておきます。初心者の方 → まずは AI for Beginners で基礎を固める現役エンジニアの方 → Aider を日常の開発に組み込んで、AIペアプロの威力を体感する未来を先取りしたい方 → OpenClaw を（安全な環境で！）デプロイして、エージェントOSの可能性を実感する AIは「コードを書いてくれるモデル」から「システムを自律的に操作するエージェント」へと進化しています。これらのツールを使いこなすことが、そのまま開発者としての競争力につながるはずです。皆さんは普段どんなエージェントフレームワークを使っていますか？おすすめのリポジトリがあれば、ぜひコメントで教えてください！

AI Microsoft LLM

dev.to•3月12日

3 Critical Pitfalls in Signup Form Validation

I was reviewing the user database at a SaaS company where I'd just started consulting. The product had grown to 8,000 active users, and they were struggling with a mysterious problem: their email marketing campaigns were bouncing at 6.2%, just above the critical 5% threshold where ISPs start treating them like spammers. When I dug deeper, I found something worse than bad emails—I found patterns of deliberate negligence in their validation strategy. One user had signed up with john@gmail.com but the database contained john@gmai1.com (with a "1" instead of an "l"). Another had registered using temp@10minutemail.com, a famous disposable email service. Still others had used catch-all addresses like noreply@company.com that would accept any email sent to them, trapping the company's transactional messages in a black hole. When I asked the engineering team why these weren't caught, they gave me variations of the same answer: "we validate with regex on the frontend," or "we validate after signup," or "we don't check for disposable emails." That's when I realized signup form validation is much more complex than most developers think. It's not just about regex patterns or timing—it's about understanding exactly when and how to validate, and what factors to check for. Today, I'll walk you through the three critical pitfalls that are probably costing your product right now. Your frontend validation probably looks something like this. You've got a React component with a form, and you validate emails with a regex pattern before the user can submit: import React, { useState } from 'react'; function SignupForm() { const [email, setEmail] = useState(''); const [errors, setErrors] = useState({}); const [submitting, setSubmitting] = useState(false); // This is what most developers rely on const validateEmailFormat = (email) => { // RFC 5322 simplified regex (but still incomplete) const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/; return emailRegex.test(email); }; const handleChange = (e) => { const value = e.target.value; setEmail(value); // Clear error when user starts typing if (errors.email) { setErrors({ ...errors, email: null }); } }; const handleSubmit = async (e) => { e.preventDefault(); const newErrors = {}; // Frontend validation if (!email) { newErrors.email = 'Email is required'; } else if (!validateEmailFormat(email)) { newErrors.email = 'Invalid email format'; } if (Object.keys(newErrors).length > 0) { setErrors(newErrors); return; } // If we get here, the email "looks valid" to us setSubmitting(true); try { // Send to backend const response = await fetch('/api/signup', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ email, password: 'placeholder' }) }); if (response.ok) { alert('Signup successful!'); } else { const data = await response.json(); setErrors({ email: data.error || 'Signup failed' }); } } catch (error) { setErrors({ email: 'Network error occurred' }); } finally { setSubmitting(false); } }; return ( <form onSubmit={handleSubmit}> <div> <label htmlFor="email">Email Address</label> <input id="email" type="email" value={email} onChange={handleChange} placeholder="you@example.com" disabled={submitting} /> {errors.email && <span className="error">{errors.email}</span>} </div> <button type="submit" disabled={submitting}> {submitting ? 'Creating account...' : 'Sign up'} </button> </form> ); } export default SignupForm; This looks fine to a user. The regex catches obviously broken addresses like notanemail or test@, and the form feels responsive. But here's what happens when you test it with real data: test@tempmail.com ✓ Passes (but it's a disposable service) user@gmail.com.xyz ✓ Passes (domain doesn't exist) john@company.com ✓ Passes (but might accept ANY address—it's catch-all) admin@example.com ✓ Passes (but it's a spamtrap) contact@spam-list.io ✓ Passes (but will damage your reputation) The problem is that regex validation only checks syntax. It never touches reality. It doesn't contact the mail server, doesn't check if the domain exists, doesn't verify that the mailbox will actually accept mail. Your 95% regex-validated emails might include 8-10% that are legitimately problematic. Here's the real cost: let's say you have 10,000 users and 8% have problematic emails. You send them your weekly digest. That's 800 bounces. Do it weekly for 4 weeks, and you've sent 3,200 emails that will bounce. Your sending reputation drops. ISPs watch this behavior. After a few weeks, Gmail starts filtering your entire domain to spam. Now your legitimate users don't receive your emails either, even though those addresses are perfectly valid. I've calculated the cost at one company: they were losing roughly $15,000 per month in undelivered transactional emails (password resets, billing notifications) that users didn't receive because their sender reputation was poisoned by 8% bad data. That's the cost of trusting regex alone. Once you realize frontend validation isn't enough, the natural instinct is to add backend validation. Many teams do this, but they make a timing mistake: they validate after the user has already been created in the database. Here's how this typically looks: // Node.js / Express backend const express = require('express'); const bcrypt = require('bcrypt'); const app = express(); app.use(express.json()); // Database connection (pseudocode) const db = require('./database'); // Simple validation function function isValidEmailFormat(email) { const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/; return regex.test(email); } // The signup endpoint app.post('/api/signup', async (req, res) => { const { email, password } = req.body; try { // Validate format (frontend probably did this too) if (!isValidEmailFormat(email)) { return res.status(400).json({ error: 'Invalid email format' }); } // Check if user already exists const existingUser = await db.users.findOne({ email }); if (existingUser) { return res.status(409).json({ error: 'Email already registered' }); } // Hash the password const hashedPassword = await bcrypt.hash(password, 10); // Create the user in the database // This is the critical moment—we're committing to this email const user = await db.users.create({ email, password: hashedPassword, createdAt: new Date(), verified: false }); // Only NOW do we send a verification email // But at this point, we don't know if the email actually exists await sendVerificationEmail(email); res.status(201).json({ message: 'Account created. Check your email for verification link.', userId: user.id }); } catch (error) { console.error('Signup error:', error); res.status(500).json({ error: 'Signup failed' }); } }); // This runs hours or days later async function sendVerificationEmail(email) { // The user never receives this because: // - Email format was wrong (typo like gmai1.com) // - Domain is disposable and inaccessible // - Domain is a spamtrap // But we already created the account! const verificationToken = generateToken(); await db.verificationTokens.create({ email, token: verificationToken }); return sendEmail({ to: email, subject: 'Verify Your Email', html: `Click here to verify: https://example.com/verify?token=${verificationToken}` }); } The problem with this approach is subtle but costly. You've created the user account before knowing whether the email is legitimate. This means: Your database is polluted with bad emails. You now have 10,000 users but maybe 800 of them have invalid emails. Cleaning this up later is expensive. Verification emails don't reach the user. They sign up, click the verification link... except they never receive the email because it's a disposable address or spamtrap. They think your system is broken. You've already paid for storage and computing. Every invalid email in your database is a record you'll need to handle, validate, or clean up later. Your metrics are misleading. You report 10,000 signups, but only 9,200 are real. This makes it hard to understand your actual product-market fit. The timing matters enormously. If you validate after signup, you're working with incomplete information. The user might never complete email verification. They might give up. They might have a bad experience and abandon your product. This is the most insidious problem because it silently degrades your deliverability over time. You validate that an email exists, but you don't check whether it's legitimate—whether it's a real person you want in your system. Disposable email services like 10minutemail.com, tempmail.com, and guerrillamail.com are designed specifically to be temporary. Users sign up for them when they don't want to give you a real email address. Spam traps are even worse—they're email addresses that don't belong to real people but are monitored by ISPs to catch senders who validate against outdated or unreliable lists. Here's what the impact looks like over time. Let's build a scenario where you're not checking for these: # Python backend with Flask from flask import Flask, request, jsonify import os from datetime import datetime, timedelta import smtplib app = Flask(__name__) # Simple email format check (same problem as before) def validate_email_format(email): import re pattern = r'^[^\s@]+@[^\s@]+\.[^\s@]+$' return re.match(pattern, email) is not None # Database connection class Database: def __init__(self): # Pseudocode—in reality this would be SQLAlchemy or similar self.users = [] def create_user(self, email, password_hash): user = { 'id': len(self.users) + 1, 'email': email, 'password': password_hash, 'created_at': datetime.now(), 'verified': False } self.users.append(user) return user db = Database() @app.route('/api/signup', methods=['POST']) def signup(): data = request.json email = data.get('email', '').strip().lower() password = data.get('password', '') # Validation step 1: Format check only if not validate_email_format(email): return jsonify({'error': 'Invalid email format'}), 400 # Validation step 2: Check if exists (but we're about to create it) existing = next((u for u in db.users if u['email'] == email), None) if existing: return jsonify({'error': 'Email already registered'}), 409 # Create user immediately import hashlib password_hash = hashlib.sha256(password.encode()).hexdigest() user = db.create_user(email, password_hash) # Send verification email send_verification_email(email) return jsonify({ 'message': 'Account created. Verify your email to continue.', 'user_id': user['id'] }), 201 def send_verification_email(email): # This function will send to: # - temp@tempmail.com (temporary, user will lose access) # - admin@fake-domain.com (spam trap, damages reputation) # - catch-all@company.com (goes nowhere meaningful) # But we don't check for any of these verification_token = 'fake_token_' + os.urandom(16).hex() # Log that we're sending (we're not actually sending in this example) print(f'Verification email sent to {email}') print(f' Token: {verification_token}') print(f' Will reach user: {predict_email_reachability(email)}') def predict_email_reachability(email): """ This function is pseudocode to show what we're missing. A real implementation would check: - Is this a known disposable service? - Is this a spam trap? - Does the domain accept all addresses (catch-all)? - Is this a role-based email? """ disposable_domains = [ 'tempmail.com', '10minutemail.com', 'guerrillamail.com', 'mailinator.com', 'temp-mail.org', 'throwaway.email' ] spam_trap_domains = [ 'fake-domain.com', 'nonexistent-company.xyz', 'spamtrap.io' ] domain = email.split('@')[1] if domain in disposable_domains: return False # Won't reach user if domain in spam_trap_domains: return False # Will damage reputation return True # Probably fine (wrong!) # Example of what gets into your database if __name__ == '__main__': test_signups = [ ('john@gmail.com', 'password123'), # Real ('temp@tempmail.com', 'password456'), # Disposable ('admin@fake-domain.com', 'password789'), # Spam trap ('bot@company.com', 'password000'), # Role account ] for email, password in test_signups: print(f'\nSignup: {email}') data = {'email': email, 'password': password} # This would call signup() in a real app reachability = predict_email_reachability(email) print(f' Can reach user: {reachability}') print(f' But we created account anyway!') Here's the damage this causes. Let's trace what happens over time: Month 1: You have 1,000 users. You don't realize that 80 of them used disposable emails and 15 used spam trap addresses. You start sending transactional emails (order confirmations, password resets). Month 2: Your bounce rate is 8%. The disposable users have lost access (those services delete inboxes after hours). The spam traps are flagging you as a problematic sender. Some ISPs start watching your reputation. Month 3: You send a marketing campaign to your 1,000 users. 800 go to inboxes, 150 go to spam folder, 50 bounce. Your metrics look bad. Month 4: Gmail, Yahoo, and Outlook have reduced trust in your sender reputation. Even emails to real users are being filtered to spam more aggressively. Your product launches a feature and tries to notify users—only 60% receive the notification because of reputation damage. The cost: If you'd validated properly at signup, that 8% bad data would have been 0%. You'd have 920 real users instead of 1,000 fake ones, but your email deliverability would be clean. That difference compounds—after a year, it could mean the difference between a successful email channel and a blocked one. Here's how to fix all three pitfalls in one solution. We'll validate the email in real-time during signup, before creating the user account, and check for disposable addresses and spam traps. First, the frontend stays simple and just collects the email. Don't do fancy validation there: import React, { useState } from 'react'; function SignupForm() { const [email, setEmail] = useState(''); const [password, setPassword] = useState(''); const [loading, setLoading] = useState(false); const [error, setError] = useState(null); const [success, setSuccess] = useState(false); const handleSubmit = async (e) => { e.preventDefault(); setError(null); setLoading(true); try { // Send to backend for real validation const response = await fetch('/api/signup', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ email, password }) }); const data = await response.json(); if (!response.ok) { // Backend will tell us if email is invalid, disposable, spam trap, etc. setError(data.error || 'Signup failed'); return; } setSuccess(true); setEmail(''); setPassword(''); } catch (error) { setError('Network error. Please try again.'); } finally { setLoading(false); } }; return ( <form onSubmit={handleSubmit}> <div> <label htmlFor="email">Email Address</label> <input id="email" type="email" value={email} onChange={(e) => setEmail(e.target.value)} placeholder="you@example.com" required disabled={loading} /> </div> <div> <label htmlFor="password">Password</label> <input id="password" type="password" value={password} onChange={(e) => setPassword(e.target.value)} required disabled={loading} /> </div> {error && <div className="error">{error}</div>} {success && <div className="success">Account created! Check your email.</div>} <button type="submit" disabled={loading}> {loading ? 'Creating account...' : 'Sign up'} </button> </form> ); } export default SignupForm; Notice how the frontend is dumb—it just sends the email to the backend. The real magic happens there. The backend is where we use BillionVerify to do real validation: // Node.js / Express backend with real email validation const express = require('express'); const https = require('https'); const bcrypt = require('bcrypt'); const app = express(); app.use(express.json()); // Database pseudocode const db = require('./database'); class BillionVerifyClient { constructor(apiKey) { this.apiKey = apiKey; } async verify(email) { return new Promise((resolve, reject) => { const requestBody = JSON.stringify({ email: email, check_smtp: true }); const options = { hostname: 'api.billionverify.com', path: '/v1/verify/single', method: 'POST', headers: { 'BV-API-KEY': this.apiKey, 'Content-Type': 'application/json', 'Content-Length': Buffer.byteLength(requestBody), 'User-Agent': 'MyApp-Signup/1.0', 'Connection': 'keep-alive' }, timeout: 5000 }; const req = https.request(options, (res) => { let data = ''; res.on('data', (chunk) => { data += chunk; }); res.on('end', () => { try { const result = JSON.parse(data); resolve(result); } catch (error) { reject(new Error('Invalid API response')); } }); }); req.on('error', reject); req.on('timeout', () => { req.destroy(); reject(new Error('Validation timeout')); }); req.write(requestBody); req.end(); }); } } const verifier = new BillionVerifyClient(process.env.BILLIONVERIFY_API_KEY); // The signup endpoint—now with real validation app.post('/api/signup', async (req, res) => { const { email, password } = req.body; try { // Step 1: Basic format check (just to catch obviously broken input) if (!email || !email.includes('@')) { return res.status(400).json({ error: 'Invalid email format' }); } // Step 2: Check if already registered const existingUser = await db.users.findOne({ email: email.toLowerCase() }); if (existingUser) { return res.status(409).json({ error: 'Email already registered' }); } // Step 3: REAL VALIDATION via BillionVerify // This is the critical difference—we validate BEFORE creating the account let validationResult; try { validationResult = await verifier.verify(email); } catch (error) { console.error('Validation service error:', error); // Fail closed: if we can't validate, reject the signup // This protects your reputation return res.status(503).json({ error: 'Email validation service temporarily unavailable. Try again later.' }); } // Step 4: Apply business rules based on validation result if (validationResult.status !== 'valid') { return res.status(400).json({ error: 'This email address does not exist or is not valid' }); } if (validationResult.is_disposable) { return res.status(400).json({ error: 'Please use a permanent email address, not a temporary one' }); } if (validationResult.is_spam_trap) { // This is a security issue—someone might be testing our system console.warn(`Spam trap signup attempt: ${email}`); return res.status(400).json({ error: 'This email address cannot be used' }); } // Step 5: Only NOW create the user account // We know the email is real, permanent, and not a spam trap const hashedPassword = await bcrypt.hash(password, 10); const user = await db.users.create({ email: email.toLowerCase(), password: hashedPassword, createdAt: new Date(), verified: false, validationData: { // Store validation data for future reference isValid: true, isCatchAll: validationResult.is_catch_all, isRoleAccount: validationResult.is_role_account, validatedAt: new Date() } }); // Step 6: Send verification email to an address we trust // The user can now actually receive and verify the email await sendVerificationEmail(email, user.id); res.status(201).json({ message: 'Account created successfully! Check your email to verify.', userId: user.id }); } catch (error) { console.error('Signup error:', error); res.status(500).json({ error: 'Signup failed. Please try again.' }); } }); async function sendVerificationEmail(email, userId) { // In a real app, this would use nodemailer or a service like SendGrid const token = generateVerificationToken(userId); // Store token in database await db.verificationTokens.create({ userId, email, token, expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000) }); // Send email (pseudocode) console.log(`Verification email sent to ${email}`); console.log(`Link: https://example.com/verify?token=${token}`); } function generateVerificationToken(userId) { const crypto = require('crypto'); return crypto.randomBytes(32).toString('hex'); } app.listen(3000, () => console.log('Server running on port 3000')); This is the complete fix. Notice the key difference: we validate the email before creating the user account, and we check not just for existence but also for disposable addresses and spam traps. If any of those checks fail, the user never gets created, and your database stays clean. Let me put a number on this. Let's say you're a SaaS product with a $99/month plan and 1,000 paying customers. Your churn rate is 5% per month (pretty standard for SaaS). At 1,000 paying customers, you need 50 new signups per day just to stay flat. If you're growing, you need more. With bad signup validation, maybe 8% of your 50 daily signups are invalid. That's 4 users per day, 120 per month, 1,440 per year who seem to convert but actually don't. They never verify their email, never use the product, never pay. You're spending money on onboarding emails that never arrive, support tickets from users who think your system is broken, and reputation damage from bounces. With proper validation, all 50 of those signups are real. Your verified user rate goes from 92% to 100%. Your email deliverability stays clean. Your support tickets from signup issues drop to nearly zero. It's not just about preventing bad data—it's about ensuring every user who signs up is actually a user who can use your product. The code examples above are complete and ready to use. Here's your implementation checklist: First, sign up for BillionVerify at https://billionverify.com/auth/sign-up. You'll get 100 free credits to test with, no credit card required. Grab your API key and set it as an environment variable. Then, integrate the Node.js code from the backend example into your signup endpoint. Test it with various emails: real ones, disposable ones, spam traps. You'll see immediately how it catches problems that regex would miss. Once you're confident it's working, deploy it to production. Your bounce rates will drop. Your user database will be cleaner. Your email reputation will improve. For detailed API documentation and additional options like batch validation, visit https://billionverify.com/docs. The difference between regex validation and real validation might seem small, but it compounds over time. Start today, and your future self will thank you.

Backend email user

dev.to•3月12日

How I Built a 4D Application Platform in 28 Days with an AI Team

Spoiler: I didn't do it alone. And I'm not ashamed to say so. 28 days ago I created the repository for Forge 4D – an open platform for building applications using a declarative markup language (SML), combining 2D UI, 3D scenes, and animation timelines, powered by the Godot engine. In those 28 days, this happened: 41,000 lines of C# were written – the SML parser, the SMS parser, the interpreter, the entire foundation All of it was then migrated to 19,000 lines of C++ Two complete parsers were designed, built, and ported The repository broke. And got fixed. If I told you I did that alone, you'd rightfully call me out. A senior developer would need one to two years for that. So let me be transparent about what actually happened – and why I think this is the future of software development. I didn't use one AI. I used several – each for what it does best. Codex wrote the bulk of everything. The SML parser – first in C#. The SMS parser and interpreter – also first in C#. Then the migration to C++. Codex is remarkably good at C#. Clean, structured, pattern-aware. But C++? That's where things got interesting. The GdScript Incident During the C++ migration, I noticed something strange. The codebase was growing in an unexpected direction. I dug in – and found that Codex had quietly written around 900 lines of GdScript. It had rebuilt the SML compiler. In GdScript. Inside a C++ project. To be fair: it wasn't wrong, exactly. GdScript is valid inside a Godot project. Codex wasn't hallucinating – it just defaulted to what it knew best when C++ got hard. It solved the problem. Just not the problem I asked it to solve. This is the moment I brought Claude in. Best for: Large-scale code generation, C#, migrations, boilerplate, repetitive patterns. Claude's job was to fix what Codex had detoured around. The GdScript had to go. The C++ migration had to be done properly. Claude worked through the architecture, cleaned up the detour, and made sure the SML and SMS parsers were correctly implemented in C++. Beyond that, Claude was my thinking partner for every architectural decision: how SML and SMS should relate to each other, how the runtime should work, how to structure the separation of concerns. Claude also committed directly to the repository. Yes, there's an AI in the contributor list – and I think that's a good thing. It's transparent. It tells the community: this project was built with AI assistance, and we're not hiding it. Best for: Architecture, refactoring, debugging, fixing other AI's mistakes, complex problem solving. Forge 4D includes image processing capabilities. For this specific task, Groq was the right choice – fast inference, ideal for visual processing pipelines. Best for: Image processing, high-speed inference tasks. The GdScript incident is the clearest possible illustration of why this matters. Codex didn't fail. It produced working code. But it drifted – toward the path of least resistance, toward the language it was most comfortable with. Without oversight, a C++ project quietly becomes a GdScript project. Think of it like a human development team. You wouldn't ask your backend architect to also do all the CSS. Everyone has strengths, and everyone has blind spots. Task AI Used Why Mass code generation (C#) Codex Excellent at C#, fast at patterns C++ migration & architecture Claude Strong reasoning, catches drift Image processing Groq Speed and inference performance Using the right tool for the right job isn't laziness. It's engineering judgment. And knowing when to switch tools is a skill in itself. For context: Forge 4D is a human-first platform for building software by describing intent, not fighting tools. You write SML (Simple Markup Language) to describe structure: Window { title: "My First App" size: 1920, 1080 Label { text: "Hello World" } } And SMS (Simple Multiplatform Script) to describe behavior: on saveAs.clicked() { log.info("Save As clicked") // your save-as flow here } No framework lock-in. No hidden magic. Plain text that humans can read and AI can generate safely. That last part matters: SML is deterministic for AI. When Claude or any other model generates SML, the output is predictable, readable, and reviewable. No black boxes. While building Forge, I also created ForgeSTA (Speech To Action) – a voice-driven workflow tool that converts speech into structured commands using Whisper.cpp. Speech → Whisper → CLI → structured output → AI pipeline I use it to write this article. I dictate, it transcribes, I structure. It's also the post-processing pipeline for a book I'm currently writing about exactly this topic: how to work with AI as a developer. The irony wasn't lost on me when ForgeSTA crashed mid-session while I was debugging it with Claude. The error was a hardcoded rpath in the whisper-cli binary pointing to the old project folder name. We fixed it by rebuilding whisper.cpp with @loader_path so the binary is portable regardless of where the project lives. Even the debugging session for the tool that transcribed this article was an AI-assisted session. That's just how it works now. If you're early in your career, this is the most important thing I can tell you: AI doesn't replace your judgment. It amplifies it. The GdScript incident is proof. Codex produced 900 lines of valid, working code – in completely the wrong language. Without a developer who understood what was happening, that would have stayed in the codebase forever. The AI didn't catch it. I did. You still need to understand: What architecture makes sense Which tool is right for which problem When generated code is solving the wrong problem How to review and own what gets committed What AI removes is the barrier between idea and execution. The months of boilerplate. The repetitive migrations. The "I know what I want to build but it would take forever" problem. In 28 days, with clear intent and the right AI tools, you can build what would have taken a solo developer years. But only if you stay in the driver's seat. Some developers hide AI usage. I put it in the commit history. That's intentional. The community deserves to know how a project was built. And honestly? Knowing that a project used Claude for architecture, Codex for code generation, and Groq for image processing tells you something useful about the project's structure and decision-making. It's not a weakness. It's a workflow. 🔨 Forge 4D on GitHub 🎙️ ForgeSTA on Codeberg Built with love, coffee, and a stubborn focus on simplicity – and a little help from some AI friends.

speech codex AI

dev.to•3月12日

Why Production AI Agents Are Hard & How Amazon Bedrock AgentCore Makes Them Production Ready

Introduction Over the past couple of years, I have architected and delivered a significant number of agentic AI applications across enterprise environments. Many of these deployments ran on Azure infrastructure using Azure Web Apps for lightweight agent endpoints and Azure Container Apps for more sophisticated multi agent systems that required orchestration, scaling, and reliable session routing. In building these systems, I have repeatedly implemented the underlying foundations myself credential vaults, memory pipelines, observability layers, and isolation mechanisms. After doing this enough times, you develop a clear understanding of both how long these pieces take to build and where the real production challenges tend to surface. When I first evaluated Amazon Bedrock AgentCore, it was the first platform I encountered that appeared to address many of these challenges holistically. Not just through surface level abstractions, but with production grade depth designed for real world deployments. That practical experience is the perspective I bring to this blog. Before we talk about AWS Bedrock AgentCore, we need to answer a more fundamental question what exactly is an AI agent?, and why is it so different from a regular chatbot or API call? “An AI agent is a software system that uses a large language model not just to generate text, but to reason, plan, take actions, and work toward a goal often across multiple steps, over time, with minimal human involvement.” Most people encounter AI through a prompt response loop type something in, get something back. That model is useful, but it is fundamentally passive. The language model sits in a box, waits to be asked, generates text, and stops. An AI agent is something entirely different. Think of a brilliant expert locked in a room with no tools. They can give extraordinary advice but they cannot act on it. Give that same expert a phone, a laptop, access to databases, the ability to send emails, run code, and call APIs. They no longer just advise. They act, verify, execute, and report back. That is the agentic paradigm. Formally, an AI agent is a software system that uses a large language model not just to generate text, but to reason, plan, take actions, and work toward a goal — across multiple steps, over time, with minimal human direction. “An AI agent doesn’t just answer your question. It takes on your objective, plans a path to achieve it, executes that plan, monitors its own progress, and self corrects when things go wrong without you directing each step.” Ask an agent: “Find our top three open support tickets today, check each against the known issues database, draft replies, and email them to the support team.” A plain language model cannot do this it has no access to your ticketing system, knowledge base, or email infrastructure. An AI agent handles the entire workflow end to end. Step 1: Query the ticketing tool for today’s open critical tickets Step 2: Search the knowledge base for related known issues Step 3: Reason about which tickets match which issues Step 4: Draft personalized reply emails using the LLM Step 5: Send those emails via the email API (This may Tool /MCP server) The LLM is the reasoning engine. The tools are how the agent reaches into real systems. And it does not stop after one response it pursues the objective through every step until the goal is met. The most critical characteristic of an AI agent and the one most often glossed over is that it is goal driven, not prompt driven. Prompt driven systems (plain LLMs) receive an input and produce an output. The interaction is complete. No awareness of a broader objective, no adaptation if the first attempt fails. Goal driven systems (agents) receive an objective and autonomously determine the steps, tool calls, and decisions required to achieve it. They persist, adapt, retry, and self correct until the goal is met or explicitly report that it cannot be. Observe, Think, Act, Repeat the agentic loop the cognitive cycle every agent runs until its objective is achieved. Strands Agents, AWS’s own open source framework, describes this as its core architecture. in each loop iteration the model is invoked with the prompt, agent context, and available tools, and it decides whether to respond in natural language, plan next steps, reflect on prior results, or select one or more tools to use. This loop continues until the task is complete. The agent reads its current goal and decomposed sub goals. It reviews all results from prior steps. It retrieves relevant short term memory. It incorporates new information from the environment since the last cycle. The LLM reasons over accumulated context and available tools to determine the single best next action. It outputs either a tool call with exact parameters, or if the goal is satisfied a final answer. Planning capable models may first decompose the goal into an explicit sub task sequence. If a tool call was selected, the framework executes the real function calling an API, querying a database, running code, navigating a browser, or invoking any registered tool. If a Human In The Loop(HITL) checkpoint is configured, the agent pauses and waits for approval. The result is captured and fed back into context. T*he action result becomes new input to the next Observe phase*. Is the goal achieved? If yes, the agent produces its final answer and terminates. If no, the loop continues potentially dozens or hundreds of iterations for complex, long running tasks. ReAct (Reason + Act) is the foundational prompting pattern that implements the agentic loop. The agent produces a structured Thought describing its reasoning, followed by an Action (the chosen tool call), followed by an Observation incorporating the result. This interleaved reasoning action format makes goal pursuit explicit, traceable, and debuggable. It is the pattern that AWS Strands Agents uses internally to power its model driven approach. Goal: Find top 3 critical GitHub bugs and draft fixes. Thought: I need to query GitHub for open issues labelled critical. Action: github_search(repo=‘company/api’, label=‘critical’, state=‘open’, limit=3) Observation: Returned issues #16, #5018, #18. Thought: I have the issues. Now I need each affected code file. Action: github_get_file(repo=‘company/api’, path=‘src/auth.py’) Observation: File retrieved. Thought: Enough context. Drafting fix for issue #16. … [loop continues for all 3 bugs] … Final Answer: Three bugs with proposed patches attached. Self Correction and Adaptive Planning fragile pipelines. When an action fails a tool returns an error, a query finds nothing, a generated solution does not compile the agent observes the failure, reasons about the cause, and adapts. A failed search triggers a reformulated query. An unavailable tool triggers a fallback. A complex goal gets decomposed into smaller achievable sub goals. When the agent determines a goal truly cannot be achieved, it surfaces a clear explanation rather than silently returning empty results. Goal driven does not mean unsupervised. Production agents are designed with explicit human in the loop checkpoints moments where the agent pauses, presents its proposed action, and waits for approval before taking any irreversible step, sending emails, deleting records, initiating payments, deploying code. AgentCore Runtime’s bi-directional WebSocket streaming makes these pause-and-resume flows practical within long running sessions, enabling real-time human collaboration without terminating and restarting the session. Tools: How Agents Act on the external or real World Without tools, a goal driven agent has nowhere to go. Tools allow agents to reach beyond language generation into real business systems. Read tools retrieve information: database queries, document reads, semantic search against knowledge bases, API calls to Salesforce, GitHub, Jira, Slack, and any other SaaS tool. Write tools create or modify data: email senders, database writers, file generators, CRM updaters, ticket creators, calendar schedulers. Execution tools run processes: code interpreters, browser automation for web based applications that have no API, and shell command runners. The production challenge: A prototype might hard code three tools. An enterprise deployment often needs fifty tools across ten SaaS platforms, each with its own authentication scheme, error patterns, and schema. Tool management becomes a major engineering project on its own. Memory: How Agents Remember Language models (LLM) are stateless. Every API call starts blank. For an agent serving the same user across weeks of ongoing work, statelessness is a fundamental blocker. Short term memory covers the active session: conversation history, task state, intermediate tool results, and reasoning steps. It requires intelligent summarization to manage the LLM’s context window limits without losing critical thread. Long term memory persists across sessions. User preferences, past project outcomes, accumulated domain knowledge, and learned patterns must survive session end and be retrievable in future sessions. This requires extraction logic, persistent storage, and semantic retrieval. Episodic memory is the most powerful form: storing specific past experiences what the agent tried, what worked, what failed, what the outcome was so it can recall and apply successful strategies in future similar situations. This is the mechanism by which agents genuinely improve over time. Observability: How Agents Are Understood and Governed produces a wrong output after twelve reasoning steps and seven tool calls, traditional logs tell you almost nothing useful. You cannot search for ‘sessions where the agent called the wrong tool’ in standard APM tools. “You cannot safely govern what you cannot observe. For AI agents in enterprise production, observability is not optional it is the difference between a system you can audit and a black box waiting to cause a compliance incident.” Agent native observability must capture the full reasoning chain in step by step order, every tool invocation with exact inputs and outputs, every LLM prompt and response with token counts, decision points where the agent chose between alternatives, failure attribution pinpointing which specific step caused a wrong downstream output, and token consumption per step for cost control. Without this, AI assisted decisions in regulated environments cannot be explained, investigated, or defended. MCP Server bridge the agent and external Data sources and MCP solves the M×N Integration Problem MCP: The Universal Connectivity Standard (USB-C) For years, every team connecting agents to external services built bespoke adapters custom code per tool, per framework, per model. This created the classic M×N integration problem if there are M agent frameworks and N external services, teams end up building M × N separate integrations. A LangChain Salesforce connector did not work with a Strands agent. Every framework switch meant rewriting all integrations. As the number of models, frameworks, and enterprise systems grew, the integration burden multiplied. MCP the Model Context Protocol is the open standard that ended this fragmentation. Published by Anthropic in 2024 and now adopted across the industry by AWS, Microsoft, Google, and others, MCP defines a universal language for agent-to-tool communication. Instead of building M × N bespoke connectors, developers can build one MCP server for a data source, and any MCP compatible agent regardless of framework or model can connect to it immediately. In effect, MCP transforms the integration landscape from M × N complexity to reusable connectivity, much like USB-C standardized hardware connectivity across devices. The MCP architecture is built around three roles: MCP Host —the agent framework that initiates connections and sends tool requests MCP Server —the lightweight connector process wrapping an external service MCP Resources and Tools —the capabilities exposed: actions the agent can invoke, data sources it can read, and prompt templates it can use By introducing a standard protocol layer, MCP removes the need to repeatedly rebuild integrations and enables true interoperability across agent frameworks, models, and enterprise systems. Across nearly every enterprise agent project, the same pattern **appears. Before the agent logic can even be written, **engineering teams must build a large amount of supporting infrastructure, including: Session routing Credential vaults Memory extraction pipelines Observability wiring Multi tenant context isolation Policy enforcement In practice, a substantial portion of early development effort goes into these foundations before the agent’s intelligence is implemented. Let’s walk through the key engineering challenges that create this gap. Traditional serverless platforms are designed for short lived, stateless workloads. Agents behave very differently. They often require long running, stateful execution environments that maintain context across many tool calls and reasoning steps. Supporting this requires infrastructure for: Session routing Per user state management Lifecycle management Dynamic scaling of execution environments Constructing this infrastructure on top of general purpose compute platforms can become a significant engineering effort before any agent logic is written. Enterprise agents frequently process sensitive user data. When thousands of users run concurrent sessions, strong isolation between sessions becomes critical. Without proper safeguards, a defect could potentially expose: One user’s data to another user Information across tenants Privileged credentials or tokens Achieving secure isolation at scale requires carefully designed execution environments, container isolation, and strict identity boundaries, rather than relying solely on application level safeguards. Agents rarely operate in isolation. They interact with external services on behalf of users, which introduces the need to manage authentication and authorization flows such as: OAuth consent processes Secure token storage Automatic token refresh Fine grained permission enforcement Audit trails for every access When agents integrate with multiple SaaS platforms across thousands of users, credential management becomes a full platform capability, not just a small feature. Agents depend heavily on memory systems to function effectively. Short Term Memory Maintaining conversation context across long interactions often requires summarization pipelines that compress earlier dialogue while preserving meaning. Long Term Memory Persistent knowledge typically involves: Information extraction pipelines Vector storage Semantic retrieval Mechanisms to reconcile new information with existing knowledge Each of these components introduces potential failure modes that can gradually degrade agent behaviour if not carefully managed, particularly in multi-tenant environments. Traditional monitoring tools measure metrics such as: Latency Error rates Throughput But production AI agents require deeper visibility. Engineers often need to understand: Which reasoning step produced an incorrect output Which tool call returned unexpected data Why the agent chose a particular decision path Achieving this level of visibility requires trace level instrumentation, structured logs, and AI aware observability dashboards. Early agent systems often embed governance rules directly inside prompts. This approach is fragile. A carefully crafted user input can sometimes influence the agent to ignore or reinterpret its own instructions. Production systems therefore require external policy enforcement layers that evaluate permissions and constraints independently of the agent’s reasoning process. This ensures governance cannot be bypassed. Real enterprise workflows rarely rely on a single agent. Instead, they often involve multiple specialized agents working together. For example: A research agent to gather information A writing agent to generate responses A verification agent to validate outputs An approval agent to enforce governance Supporting these workflows requires infrastructure for: Inter agent communication Shared state management Workflow orchestration Failure handling and retries This coordination layer introduces yet another architectural component to an already complex system. Amazon Bedrock AgentCore is an agentic platform from AWS designed to build, deploy, and operate AI agents securely at scale. It provides a set of modular, enterprise grade services that handle the infrastructure required to run production grade AI agents without developers having to manage the underlying systems. In real world deployments, building an agent is only a small part of the challenge. Production systems must manage runtime execution, memory, tool connectivity, identity, security, and observability before agents can reliably interact with enterprise data and services. These infrastructure concerns often become the primary barrier to moving from prototype agents to production systems. Amazon Bedrock AgentCore addresses this challenge by providing fully managed services that remove the undifferentiated heavy lifting of building agent infrastructure. Developers can focus on implementing the agent’s reasoning and workflows while AgentCore manages the operational backbone required to run agents reliably in enterprise environments. AgentCore services are modular and composable, meaning they can be used together or independently depending on the architecture of the system. The platform is also framework agnostic and model agnostic, supporting popular open source agent frameworks such as LangGraph, CrewAI, LlamaIndex, and Strands Agents, and it can work with foundation models from Amazon Bedrock or external providers. At a high level, AgentCore provides capabilities such as: AgentCore Runtime : A secure serverless environment for running agents and tools AgentCore Memory : Managed short term and long term memory for context aware agents AgentCore Gateway : A service that converts APIs and services into MCP-compatible tools for agents AgentCore Identity : Identity and access management designed specifically for AI agents Built in tools and observability : Including code execution, browser automation, monitoring, and evaluation capabilities Together, these services form a production infrastructure layer for agentic systems, allowing teams to deploy AI agents that are secure, scalable, observable, and capable of interacting with real enterprise systems AgentCore Runtime is the secure, serverless execution environment for AI agents. Each user session runs inside a dedicated, hardware isolated microVM, providing strong isolation of CPU, memory, and filesystem resources. Isolation is enforced at the virtualization layer, ensuring one user’s agent cannot access another user’s data. When a session ends due to 15 minutes of inactivity, user termination, or the 8 hour maximum session limit the microVM is destroyed and memory is fully sanitized, preventing cross session data leakage. Framework Compatibility AgentCore Runtime is framework agnostic and works with common agent frameworks such as: Strands Agents (AWS) LangChain / LangGraph LlamaIndex Microsoft Agent Framework (Autogen + Semantic Kernel) It can also host any custom agent implementation that runs inside a container. Minimal Integration Existing agents can be deployed with a small wrapper: from bedrock_agentcore.runtime import BedrockAgentCoreApp app = BedrockAgentCoreApp() @app.entrypoint def invoke(payload): return your_agent(payload.get("prompt", "")) Deployment: agentcore configure AgentCore is model agnostic and works with major foundation models including: Amazon Nova Anthropic Claude OpenAI GPT Google Gemini Meta Llama Mistral Your agent chooses the model; AgentCore only provides the execution environment. Communication AgentCore supports two interaction modes: HTTP API—standard request/response execution Bi directional WebSocket streaming real-time conversational and multi-turn agents Using a sessionId keeps requests routed to the same microVM session, preserving state. Strands Agents Strands Agents is AWS’s open source agent framework designed around a model first approach. A Strands agent is defined by three elements: Model Tools Prompt The model drives planning and tool usage. Strands agents deploy to AgentCore Runtime using the same lightweight SDK wrapper AgentCore supports two deployment paths. Direct code upload Container deployment Both use the same lifecycle: agentcore configure agentcore deploy Deployments are immutable and versioned, allowing multiple versions and canary testing before traffic promotion. AgentCore Gateway converts existing APIs, AWS Lambda functions, and OpenAPI specifications into agent ready MCP tools automatically without writing custom adapters. From API to Agent Tool Point Gateway to a Lambda function or OpenAPI specification and it automatically: Generates the MCP tool schema Handles protocol translation Exposes the API as a discoverable agent tool What previously required weeks of custom integration can now be done in minutes. agentcore gateway create \ Once registered, any MCP compatible agent can discover and invoke the tool. MCP Native Architecture Gateway is built around the Model Context Protocol (MCP). Registered tools become automatically usable by MCP compatible frameworks such as: Strands LangGraph CrewAI Agents can dynamically discover tools at runtime rather than requiring tools to be hardcoded during initialization. SaaS Integration Gateway provides built in connectors for common enterprise platforms such as: GitHub Salesforce Slack Google Workspace Microsoft 365 Jira / Confluence These connectors handle authentication, schema generation, and error handling automatically. Gateway also supports the Agent2Agent (A2A) protocol, which standardizes how agents communicate with each other. Agents built using different frameworks can delegate tasks across systems while communicating through standardized A2A messages. AgentCore Identity manages authentication and credential delegation for AI agents accessing external systems. It controls both: Who can invoke the agent How the agent authenticates to external services Supported authentication mechanisms include: AWS IAM SigV4 for internal services OAuth 2.0 and OpenID Connect for external users and applications Compatible identity providers include Amazon Cognito, Okta, Microsoft Entra ID, and Auth0. For system-level tasks, agents authenticate using OAuth Client Credentials without a user involved. Common scenarios: Scheduled workflows Background analytics System integrations When agents act on behalf of a user, AgentCore manages the full OAuth lifecycle: User consent flow Encrypted token storage Token refresh Access auditing All credentials are stored in an encrypted vault protected by customer managed KMS keys. AgentCore Memory provides built in memory management for agents without requiring developers to build custom vector pipelines. It supports three types of memory: Short Term Memory Maintains session context, including conversation history, tool outputs, and reasoning state. Long Term Memory Stores extracted knowledge such as user preferences, decisions, and discovered facts so future sessions begin with relevant context. Episodic Memory Stores past experiences what actions were attempted and which strategies succeeded enabling agents to improve behavior over time. Some enterprise systems can only be accessed through a web interface. AgentCore Browser provides isolated browser instances that agents can use to interact with websites and web applications. Agents can: Navigate multi step workflows Fill forms Extract information from dynamic pages Interact with internal portals Each session runs in a sandboxed browser environment, which is destroyed when the session ends. When agents generate code for analysis or computation, that code must execute safely. AgentCore Code Interpreter provides an isolated execution sandbox where generated code can run securely. Agents can use it to: Analyze datasets Run calculations Generate charts and files Validate generated code Each execution occurs in a separate ephemeral sandbox with no access to other sessions or infrastructure. The Platform for the Production Agent Era Having architected agentic systems across Azure Web Apps, Azure Container Apps, and custom infrastructure, I know how much engineering effort goes into the layers that production agents require. Session routing, credential management, memory pipelines, observability, governance policies, and multi tenant isolation are all necessary pieces of a reliable agent system. None of them are impossible to build but they consume time that should be spent improving the reasoning, behavior, and usefulness of the agent itself. This is the problem Amazon Bedrock AgentCore is designed to solve. AgentCore provides 7 purpose built services that handle the production infrastructure required for agent systems: Runtime: Secure microVM execution for agents Gateway: MCP-native tool integration and API exposure Identity: OAuth credential lifecycle and delegated access Memory: Short term and Long term persistent memory for agents Browser: managed browser automation for web interactions Code Interpreter: Isolated sandbox for executing generated code Observability: CloudWatch native tracing with OpenTelemetry support AgentCore is framework agnostic and works with common agent frameworks such as Strands, LangChain, LangGraph, LlamaIndex, CrewAI, and AutoGen, as well as custom implementations. It is also model agnostic, allowing agents to use foundation models including Amazon Nova, Anthropic Claude, OpenAI GPT models, Google Gemini, Meta Llama, and Mistral, or any model accessible through an API. The question is no longer whether a production AI agent can be built. how quickly you can deliver it to the people who need it. pip install bedrock-agentcore bedrock-agentcore-starter-toolkit Thanks

Agent agents AI

准备好开始 Vibe Coding 了吗？

探索最适合你的 AI 编程工具，提升开发效率

立即探索

探索VIBE CODING的无限可能

热门工具

最新资讯

准备好开始 Vibe Coding 了吗？