Building Financial Prompt Pipelines

Dec 28 2025 AI finance-ai

Once you have effective prompts for individual tasks, the next challenge is connecting them into reliable workflows. Financial services demand more than single-shot responses - they require multi-stage pipelines with validation at every step. Prompt chaining and feedback loops are the mechanisms that transform individual prompts into robust, production-ready systems.

From Single Prompts to Pipelines

Consider a loan application review. A single prompt asking “Should we approve this loan?” might work for simple cases, but real underwriting requires multiple stages: data extraction, risk assessment, compliance checks, and final decision. Each stage has different requirements and potential failure points.

flowchart TB
    subgraph Single["Single Prompt"]
        direction LR
        I1[Input] --> O1[Output]
    end

    subgraph Pipeline["Prompt Pipeline"]
        direction LR
        I2[Input] --> S1[Stage 1]
        S1 --> G1{Gate}
        G1 -->|Pass| S2[Stage 2]
        G1 -->|Fail| R1[Retry/Error]
        S2 --> G2{Gate}
        G2 -->|Pass| S3[Stage 3]
        G2 -->|Fail| R2[Retry/Error]
        S3 --> O2[Output]
    end

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff

    class Pipeline blueClass

Prompt Chaining Fundamentals

Prompt chaining connects the inputs and outputs of prompts programmatically. The output of one LLM call becomes input for the next, creating a processing pipeline that can handle sophisticated, multi-step tasks.

A Simple Chain Example

# Stage 1: Extract data from loan application
extract_prompt = """
Extract the following from this loan application:
- Applicant name
- Requested amount
- Annual income
- Existing debts
- Credit score

Application: {application_text}
"""

# Stage 2: Assess risk using extracted data
assess_prompt = """
Given these applicant details, assess loan risk:
{extracted_data}

Evaluate:
1. Debt-to-income ratio
2. Credit score category
3. Loan-to-income ratio

Provide risk rating: LOW, MEDIUM, or HIGH
"""

# Stage 3: Make decision based on assessment
decision_prompt = """
Based on this risk assessment:
{risk_assessment}

Provide final recommendation:
- APPROVE, DECLINE, or REFER TO UNDERWRITER
- Reasoning for decision
- Any conditions if approved
"""

Why Chaining Matters for Finance

Auditability: Each stage produces documented outputs that can be reviewed
Specialization: Different prompts can be optimized for different tasks
Validation Points: Errors can be caught between stages, not just at the end
Modularity: Stages can be updated independently as requirements change

Gate Checks: Quality Control Between Steps

Chaining prompts isn’t enough on its own. LLMs can hallucinate, produce incorrect formats, or miss instructions. An error in an early stage cascades through the entire pipeline - the domino effect. Gate checks are programmatic validations placed between steps to ensure quality.

flowchart TD
    P1[Stage 1: Extract Data] --> G1{Format Check}
    G1 -->|Valid JSON| P2[Stage 2: Risk Assessment]
    G1 -->|Invalid| R1[Retry with Error Feedback]
    R1 --> P1

    P2 --> G2{Logic Check}
    G2 -->|Scores in Range| P3[Stage 3: Decision]
    G2 -->|Out of Range| R2[Retry with Error Feedback]
    R2 --> P2

    P3 --> G3{Content Check}
    G3 -->|Has Required Fields| O[Final Output]
    G3 -->|Missing Fields| R3[Retry with Error Feedback]
    R3 --> P3

    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff

    class G1 orangeClass
    class G2 orangeClass
    class G3 orangeClass

Three Types of Gate Checks

1. Format Checks: Validate structure

from pydantic import BaseModel, Field

class RiskAssessment(BaseModel):
    credit_score: int = Field(ge=300, le=850)
    debt_to_income: float = Field(ge=0, le=1)
    risk_level: str = Field(pattern="^(LOW|MEDIUM|HIGH)$")
    reasoning: str

def validate_format(output: str) -> RiskAssessment:
    """Validate output matches expected structure"""
    try:
        return RiskAssessment.model_validate_json(output)
    except ValidationError as e:
        raise GateCheckError(f"Format validation failed: {e}")

2. Content Checks: Verify business logic

def validate_content(assessment: RiskAssessment) -> bool:
    """Check content makes business sense"""
    # High credit score shouldn't result in HIGH risk without reason
    if assessment.credit_score > 750 and assessment.risk_level == "HIGH":
        if "fraud" not in assessment.reasoning.lower():
            return False

    # Debt-to-income over 0.5 should never be LOW risk
    if assessment.debt_to_income > 0.5 and assessment.risk_level == "LOW":
        return False

    return True

3. Logic Checks: Ensure mathematical correctness

def validate_logic(extracted: dict, assessment: RiskAssessment) -> bool:
    """Verify calculations are correct"""
    # Recalculate debt-to-income from source data
    expected_dti = extracted["total_debt"] / extracted["annual_income"]

    # Allow small tolerance for rounding
    if abs(expected_dti - assessment.debt_to_income) > 0.01:
        return False

    return True

Handling Gate Check Failures

When a check fails, you have three options:

def run_stage_with_validation(prompt, input_data, validate_fn, max_retries=3):
    for attempt in range(max_retries):
        output = call_llm(prompt.format(**input_data))

        try:
            validated = validate_fn(output)
            return validated
        except GateCheckError as e:
            if attempt == max_retries - 1:
                raise  # Give up after max retries

            # Retry with error feedback
            input_data["previous_error"] = str(e)
            prompt = prompt + """

            Your previous response had this error: {previous_error}
            Please correct and try again.
            """

LLM Feedback Loops

Gate checks catch obvious errors, but what about quality improvements? Feedback loops allow the system to iteratively refine outputs until they meet quality thresholds.

flowchart TD
    I[Input] --> G[Generate]
    G --> E[Evaluate]
    E --> D{Meets Criteria?}
    D -->|Yes| O[Output]
    D -->|No| F[Feedback]
    F --> G

    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff

    class E orangeClass
    class D greenClass

Anatomy of a Feedback Loop

The core mechanism is prompt chaining where evaluation feedback is incorporated into the next generation attempt:

def iterative_refinement(task, criteria, max_iterations=5):
    current_output = None

    for iteration in range(max_iterations):
        if current_output is None:
            # First attempt
            output = generate(task)
        else:
            # Refinement attempt
            output = generate(task, previous=current_output, feedback=feedback)

        # Evaluate against criteria
        evaluation = evaluate(output, criteria)

        if evaluation.passes:
            return output

        # Prepare feedback for next iteration
        feedback = evaluation.feedback
        current_output = output

    return current_output  # Best effort after max iterations

Sources of Feedback

Self-Correction: The LLM evaluates its own output

self_review_prompt = """
Review this investment recommendation for a conservative client:
{recommendation}

Check for:
1. Risk level appropriateness
2. Diversification adequacy
3. Clear rationale

Rate 1-10 and provide specific feedback if rating < 8.
"""

External Tools: Objective validation from code execution

def code_feedback(generated_code: str) -> dict:
    """Run tests and return feedback"""
    result = subprocess.run(
        ["pytest", "tests/"],
        capture_output=True,
        text=True
    )
    return {
        "passed": result.returncode == 0,
        "output": result.stdout,
        "errors": result.stderr
    }

Validation Checks: Programmatic verification

def compliance_feedback(recommendation: dict) -> dict:
    """Check against compliance rules"""
    issues = []

    if recommendation["equity_percentage"] > 60:
        issues.append("Equity allocation exceeds conservative threshold")

    if "risk_disclosure" not in recommendation:
        issues.append("Missing required risk disclosure statement")

    return {
        "approved": len(issues) == 0,
        "issues": issues
    }

Applied Example: Investment Advisory Pipeline

Let’s build a complete pipeline for generating and validating investment recommendations. This demonstrates both chaining and feedback loops.

The Pipeline Architecture

flowchart TD
    CP[Client Profile] --> IA[Investment Advisor Agent]
    IA --> REC[Initial Recommendation]
    REC --> CO[Compliance Officer Agent]
    CO --> D{Approved?}
    D -->|Yes| OUT[Final Recommendation]
    D -->|No| FB[Feedback]
    FB --> IA

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff

    class IA blueClass
    class CO orangeClass

Implementation

ADVISOR_PROMPT = """
You are a Certified Financial Planner creating investment recommendations.

Client Profile:
{client_profile}

{previous_feedback}

Create a diversified investment recommendation including:
1. Asset allocation percentages
2. Specific fund recommendations
3. Risk assessment
4. Rationale aligned with client goals
"""

COMPLIANCE_PROMPT = """
You are a Compliance Officer reviewing investment recommendations.

Client Profile:
{client_profile}

Recommendation to Review:
{recommendation}

Evaluate:
1. Risk appropriateness for client profile
2. Diversification adequacy
3. Regulatory compliance
4. Suitability for stated goals

Rate 1-10. If 8 or above, respond with "APPROVED FOR CLIENT".
Otherwise, provide specific feedback for revision.
"""

def investment_advisory_pipeline(client_profile: dict, max_iterations: int = 3):
    recommendation = None
    previous_feedback = ""

    for iteration in range(max_iterations):
        # Generate recommendation
        advisor_response = call_llm(
            ADVISOR_PROMPT.format(
                client_profile=client_profile,
                previous_feedback=previous_feedback
            ),
            temperature=0.6  # Higher for creativity
        )
        recommendation = advisor_response

        # Compliance review
        compliance_response = call_llm(
            COMPLIANCE_PROMPT.format(
                client_profile=client_profile,
                recommendation=recommendation
            ),
            temperature=0.2  # Lower for consistency
        )

        if "APPROVED FOR CLIENT" in compliance_response:
            return {
                "recommendation": recommendation,
                "compliance_review": compliance_response,
                "iterations": iteration + 1
            }

        # Extract feedback for next iteration
        previous_feedback = f"""
        Previous recommendation was not approved.
        Compliance feedback: {compliance_response}
        Please revise addressing these concerns.
        """

    return {
        "recommendation": recommendation,
        "status": "MAX_ITERATIONS_REACHED",
        "final_feedback": compliance_response
    }

Temperature Settings Matter

Notice the different temperature settings:

Investment Advisor (0.6): Higher temperature allows for creative, diverse recommendations
Compliance Officer (0.2): Lower temperature ensures consistent, deterministic reviews

This pattern is common in financial pipelines: creative stages benefit from variability while validation stages need reliability.

Monitoring and Debugging Pipelines

Building the pipeline is half the battle; knowing if it’s working correctly is the other half.

What to Monitor

Metric	Purpose	Example
Stage Success Rate	Track where failures occur	Stage 2 fails 15% of runs
Iteration Count	Measure refinement efficiency	Average 2.3 iterations to approval
Latency per Stage	Identify bottlenecks	Compliance check takes 4s average
Gate Check Failures	Understand error patterns	Format errors: 8%, Logic errors: 3%

Structured Logging

import logging
import json

def log_stage(stage_name: str, input_data: dict, output: str, validation_result: dict):
    logging.info(json.dumps({
        "stage": stage_name,
        "timestamp": datetime.now().isoformat(),
        "input_summary": summarize(input_data),
        "output_length": len(output),
        "validation_passed": validation_result.get("passed"),
        "validation_errors": validation_result.get("errors", [])
    }))

Common Pipeline Problems

Problem	Symptom	Solution
Infinite Loops	Same error repeating	Add max iteration limits, vary retry prompts
Cascading Errors	Later stages always fail	Tighter gate checks on earlier stages
Slow Convergence	Many iterations to success	Improve feedback specificity
Inconsistent Output	Random failures	Lower temperature, stricter format requirements

Best Practices for Financial Pipelines

1. Separate Configuration from Prompts

Evaluation criteria change with business rules. Keep them external:

# config/lending_rules.json
{
    "credit_score_threshold": 650,
    "max_debt_to_income": 0.43,
    "risk_levels": ["LOW", "MEDIUM", "HIGH"]
}

# Load and inject into prompts
rules = load_config("lending_rules.json")
prompt = ASSESSMENT_PROMPT.format(
    credit_threshold=rules["credit_score_threshold"],
    dti_limit=rules["max_debt_to_income"]
)

2. Design for Failure Recovery

Every external call can fail. Plan for it:

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def call_llm_with_retry(prompt: str) -> str:
    return llm_client.complete(prompt)

3. Preserve Audit Trails

Financial decisions require documentation:

def run_pipeline_with_audit(input_data: dict) -> dict:
    audit_trail = []

    for stage in pipeline_stages:
        stage_result = run_stage(stage, input_data)

        audit_trail.append({
            "stage": stage.name,
            "timestamp": datetime.now(),
            "input": input_data,
            "output": stage_result,
            "validation": stage.validate(stage_result)
        })

        input_data = stage_result

    return {
        "result": input_data,
        "audit_trail": audit_trail
    }

4. Use Structured Outputs

Pydantic models ensure consistent data flow between stages:

class LoanApplication(BaseModel):
    applicant_name: str
    requested_amount: float = Field(gt=0)
    annual_income: float = Field(gt=0)
    credit_score: int = Field(ge=300, le=850)

class RiskAssessment(BaseModel):
    application: LoanApplication
    risk_level: Literal["LOW", "MEDIUM", "HIGH"]
    debt_to_income: float
    recommendation: str

class FinalDecision(BaseModel):
    assessment: RiskAssessment
    decision: Literal["APPROVE", "DECLINE", "REFER"]
    conditions: list[str] = []
    reasoning: str

Takeaways

Prompt chaining connects individual prompts into multi-stage pipelines where each output feeds the next input
Gate checks validate outputs between stages, catching errors before they cascade through the pipeline
Feedback loops enable iterative refinement by incorporating evaluation results into subsequent generation attempts
Different stages need different settings - creative stages benefit from higher temperature while validation stages need consistency
Monitoring is essential - track success rates, iteration counts, and stage latencies to identify and fix problems
Design for failure - use retries with exponential backoff and preserve audit trails for compliance

This is the third post in my Applied Agentic AI for Finance series. Next: Modeling Agentic Workflows for Finance where we’ll explore architectural patterns for complete financial agent workflows.

#llm #python #finance

Building Financial Prompt Pipelines

From Single Prompts to Pipelines

Prompt Chaining Fundamentals

A Simple Chain Example

Why Chaining Matters for Finance

Gate Checks: Quality Control Between Steps

Three Types of Gate Checks

Handling Gate Check Failures

LLM Feedback Loops

Anatomy of a Feedback Loop

Sources of Feedback

Applied Example: Investment Advisory Pipeline

The Pipeline Architecture

Implementation

Temperature Settings Matter

Monitoring and Debugging Pipelines

What to Monitor

Structured Logging

Common Pipeline Problems

Best Practices for Financial Pipelines

1. Separate Configuration from Prompts

2. Design for Failure Recovery

3. Preserve Audit Trails

4. Use Structured Outputs

Takeaways

Comments

Your browser is out-of-date!

Building Financial Prompt Pipelines

From Single Prompts to Pipelines

Prompt Chaining Fundamentals

A Simple Chain Example

Why Chaining Matters for Finance

Gate Checks: Quality Control Between Steps

Three Types of Gate Checks

Handling Gate Check Failures

LLM Feedback Loops

Anatomy of a Feedback Loop

Sources of Feedback

Applied Example: Investment Advisory Pipeline

The Pipeline Architecture

Implementation

Temperature Settings Matter

Monitoring and Debugging Pipelines

What to Monitor

Structured Logging

Common Pipeline Problems

Best Practices for Financial Pipelines

1. Separate Configuration from Prompts

2. Design for Failure Recovery

3. Preserve Audit Trails

4. Use Structured Outputs

Takeaways

Related Posts

Comments

Your browser is out-of-date!