Once you have effective prompts for individual tasks, the next challenge is connecting them into reliable workflows. Financial services demand more than single-shot responses - they require multi-stage pipelines with validation at every step. Prompt chaining and feedback loops are the mechanisms that transform individual prompts into robust, production-ready systems.
From Single Prompts to Pipelines
Consider a loan application review. A single prompt asking “Should we approve this loan?” might work for simple cases, but real underwriting requires multiple stages: data extraction, risk assessment, compliance checks, and final decision. Each stage has different requirements and potential failure points.
flowchart LR
subgraph Single["Single Prompt"]
I1[Input] --> O1[Output]
end
subgraph Pipeline["Prompt Pipeline"]
I2[Input] --> S1[Stage 1]
S1 --> G1{Gate}
G1 -->|Pass| S2[Stage 2]
G1 -->|Fail| R1[Retry/Error]
S2 --> G2{Gate}
G2 -->|Pass| S3[Stage 3]
G2 -->|Fail| R2[Retry/Error]
S3 --> O2[Output]
end
classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
class Pipeline blueClass
Prompt Chaining Fundamentals
Prompt chaining connects the inputs and outputs of prompts programmatically. The output of one LLM call becomes input for the next, creating a processing pipeline that can handle sophisticated, multi-step tasks.
A Simple Chain Example
1 | # Stage 1: Extract data from loan application |
Why Chaining Matters for Finance
- Auditability: Each stage produces documented outputs that can be reviewed
- Specialization: Different prompts can be optimized for different tasks
- Validation Points: Errors can be caught between stages, not just at the end
- Modularity: Stages can be updated independently as requirements change
Gate Checks: Quality Control Between Steps
Chaining prompts isn’t enough on its own. LLMs can hallucinate, produce incorrect formats, or miss instructions. An error in an early stage cascades through the entire pipeline - the domino effect. Gate checks are programmatic validations placed between steps to ensure quality.
flowchart TD
P1[Stage 1: Extract Data] --> G1{Format Check}
G1 -->|Valid JSON| P2[Stage 2: Risk Assessment]
G1 -->|Invalid| R1[Retry with Error Feedback]
R1 --> P1
P2 --> G2{Logic Check}
G2 -->|Scores in Range| P3[Stage 3: Decision]
G2 -->|Out of Range| R2[Retry with Error Feedback]
R2 --> P2
P3 --> G3{Content Check}
G3 -->|Has Required Fields| O[Final Output]
G3 -->|Missing Fields| R3[Retry with Error Feedback]
R3 --> P3
classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
class G1 orangeClass
class G2 orangeClass
class G3 orangeClass
Three Types of Gate Checks
1. Format Checks: Validate structure
1 | from pydantic import BaseModel, Field |
2. Content Checks: Verify business logic
1 | def validate_content(assessment: RiskAssessment) -> bool: |
3. Logic Checks: Ensure mathematical correctness
1 | def validate_logic(extracted: dict, assessment: RiskAssessment) -> bool: |
Handling Gate Check Failures
When a check fails, you have three options:
1 | def run_stage_with_validation(prompt, input_data, validate_fn, max_retries=3): |
LLM Feedback Loops
Gate checks catch obvious errors, but what about quality improvements? Feedback loops allow the system to iteratively refine outputs until they meet quality thresholds.
flowchart TD
I[Input] --> G[Generate]
G --> E[Evaluate]
E --> D{Meets Criteria?}
D -->|Yes| O[Output]
D -->|No| F[Feedback]
F --> G
classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff
class E orangeClass
class D greenClass
Anatomy of a Feedback Loop
The core mechanism is prompt chaining where evaluation feedback is incorporated into the next generation attempt:
1 | def iterative_refinement(task, criteria, max_iterations=5): |
Sources of Feedback
Self-Correction: The LLM evaluates its own output
1 | self_review_prompt = """ |
External Tools: Objective validation from code execution
1 | def code_feedback(generated_code: str) -> dict: |
Validation Checks: Programmatic verification
1 | def compliance_feedback(recommendation: dict) -> dict: |
Applied Example: Investment Advisory Pipeline
Let’s build a complete pipeline for generating and validating investment recommendations. This demonstrates both chaining and feedback loops.
The Pipeline Architecture
flowchart TD
CP[Client Profile] --> IA[Investment Advisor Agent]
IA --> REC[Initial Recommendation]
REC --> CO[Compliance Officer Agent]
CO --> D{Approved?}
D -->|Yes| OUT[Final Recommendation]
D -->|No| FB[Feedback]
FB --> IA
classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
class IA blueClass
class CO orangeClass
Implementation
1 | ADVISOR_PROMPT = """ |
Temperature Settings Matter
Notice the different temperature settings:
- Investment Advisor (0.6): Higher temperature allows for creative, diverse recommendations
- Compliance Officer (0.2): Lower temperature ensures consistent, deterministic reviews
This pattern is common in financial pipelines: creative stages benefit from variability while validation stages need reliability.
Monitoring and Debugging Pipelines
Building the pipeline is half the battle; knowing if it’s working correctly is the other half.
What to Monitor
| Metric | Purpose | Example |
|---|---|---|
| Stage Success Rate | Track where failures occur | Stage 2 fails 15% of runs |
| Iteration Count | Measure refinement efficiency | Average 2.3 iterations to approval |
| Latency per Stage | Identify bottlenecks | Compliance check takes 4s average |
| Gate Check Failures | Understand error patterns | Format errors: 8%, Logic errors: 3% |
Structured Logging
1 | import logging |
Common Pipeline Problems
| Problem | Symptom | Solution |
|---|---|---|
| Infinite Loops | Same error repeating | Add max iteration limits, vary retry prompts |
| Cascading Errors | Later stages always fail | Tighter gate checks on earlier stages |
| Slow Convergence | Many iterations to success | Improve feedback specificity |
| Inconsistent Output | Random failures | Lower temperature, stricter format requirements |
Best Practices for Financial Pipelines
1. Separate Configuration from Prompts
Evaluation criteria change with business rules. Keep them external:
1 | # config/lending_rules.json |
2. Design for Failure Recovery
Every external call can fail. Plan for it:
1 | from tenacity import retry, stop_after_attempt, wait_exponential |
3. Preserve Audit Trails
Financial decisions require documentation:
1 | def run_pipeline_with_audit(input_data: dict) -> dict: |
4. Use Structured Outputs
Pydantic models ensure consistent data flow between stages:
1 | class LoanApplication(BaseModel): |
Takeaways
Prompt chaining connects individual prompts into multi-stage pipelines where each output feeds the next input
Gate checks validate outputs between stages, catching errors before they cascade through the pipeline
Feedback loops enable iterative refinement by incorporating evaluation results into subsequent generation attempts
Different stages need different settings - creative stages benefit from higher temperature while validation stages need consistency
Monitoring is essential - track success rates, iteration counts, and stage latencies to identify and fix problems
Design for failure - use retries with exponential backoff and preserve audit trails for compliance
This is the third post in my Applied Agentic AI for Finance series. Next: Modeling Agentic Workflows for Finance where we’ll explore architectural patterns for complete financial agent workflows.
Comments