Building Financial Prompt Pipelines

Once you have effective prompts for individual tasks, the next challenge is connecting them into reliable workflows. Financial services demand more than single-shot responses - they require multi-stage pipelines with validation at every step. Prompt chaining and feedback loops are the mechanisms that transform individual prompts into robust, production-ready systems.

From Single Prompts to Pipelines

Consider a loan application review. A single prompt asking “Should we approve this loan?” might work for simple cases, but real underwriting requires multiple stages: data extraction, risk assessment, compliance checks, and final decision. Each stage has different requirements and potential failure points.

flowchart LR
    subgraph Single["Single Prompt"]
        I1[Input] --> O1[Output]
    end

    subgraph Pipeline["Prompt Pipeline"]
        I2[Input] --> S1[Stage 1]
        S1 --> G1{Gate}
        G1 -->|Pass| S2[Stage 2]
        G1 -->|Fail| R1[Retry/Error]
        S2 --> G2{Gate}
        G2 -->|Pass| S3[Stage 3]
        G2 -->|Fail| R2[Retry/Error]
        S3 --> O2[Output]
    end

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff

    class Pipeline blueClass

Prompt Chaining Fundamentals

Prompt chaining connects the inputs and outputs of prompts programmatically. The output of one LLM call becomes input for the next, creating a processing pipeline that can handle sophisticated, multi-step tasks.

A Simple Chain Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Stage 1: Extract data from loan application
extract_prompt = """
Extract the following from this loan application:
- Applicant name
- Requested amount
- Annual income
- Existing debts
- Credit score

Application: {application_text}
"""

# Stage 2: Assess risk using extracted data
assess_prompt = """
Given these applicant details, assess loan risk:
{extracted_data}

Evaluate:
1. Debt-to-income ratio
2. Credit score category
3. Loan-to-income ratio

Provide risk rating: LOW, MEDIUM, or HIGH
"""

# Stage 3: Make decision based on assessment
decision_prompt = """
Based on this risk assessment:
{risk_assessment}

Provide final recommendation:
- APPROVE, DECLINE, or REFER TO UNDERWRITER
- Reasoning for decision
- Any conditions if approved
"""

Why Chaining Matters for Finance

  1. Auditability: Each stage produces documented outputs that can be reviewed
  2. Specialization: Different prompts can be optimized for different tasks
  3. Validation Points: Errors can be caught between stages, not just at the end
  4. Modularity: Stages can be updated independently as requirements change

Gate Checks: Quality Control Between Steps

Chaining prompts isn’t enough on its own. LLMs can hallucinate, produce incorrect formats, or miss instructions. An error in an early stage cascades through the entire pipeline - the domino effect. Gate checks are programmatic validations placed between steps to ensure quality.

flowchart TD
    P1[Stage 1: Extract Data] --> G1{Format Check}
    G1 -->|Valid JSON| P2[Stage 2: Risk Assessment]
    G1 -->|Invalid| R1[Retry with Error Feedback]
    R1 --> P1

    P2 --> G2{Logic Check}
    G2 -->|Scores in Range| P3[Stage 3: Decision]
    G2 -->|Out of Range| R2[Retry with Error Feedback]
    R2 --> P2

    P3 --> G3{Content Check}
    G3 -->|Has Required Fields| O[Final Output]
    G3 -->|Missing Fields| R3[Retry with Error Feedback]
    R3 --> P3

    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff

    class G1 orangeClass
    class G2 orangeClass
    class G3 orangeClass

Three Types of Gate Checks

1. Format Checks: Validate structure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from pydantic import BaseModel, Field

class RiskAssessment(BaseModel):
credit_score: int = Field(ge=300, le=850)
debt_to_income: float = Field(ge=0, le=1)
risk_level: str = Field(pattern="^(LOW|MEDIUM|HIGH)$")
reasoning: str

def validate_format(output: str) -> RiskAssessment:
"""Validate output matches expected structure"""
try:
return RiskAssessment.model_validate_json(output)
except ValidationError as e:
raise GateCheckError(f"Format validation failed: {e}")

2. Content Checks: Verify business logic

1
2
3
4
5
6
7
8
9
10
11
12
def validate_content(assessment: RiskAssessment) -> bool:
"""Check content makes business sense"""
# High credit score shouldn't result in HIGH risk without reason
if assessment.credit_score > 750 and assessment.risk_level == "HIGH":
if "fraud" not in assessment.reasoning.lower():
return False

# Debt-to-income over 0.5 should never be LOW risk
if assessment.debt_to_income > 0.5 and assessment.risk_level == "LOW":
return False

return True

3. Logic Checks: Ensure mathematical correctness

1
2
3
4
5
6
7
8
9
10
def validate_logic(extracted: dict, assessment: RiskAssessment) -> bool:
"""Verify calculations are correct"""
# Recalculate debt-to-income from source data
expected_dti = extracted["total_debt"] / extracted["annual_income"]

# Allow small tolerance for rounding
if abs(expected_dti - assessment.debt_to_income) > 0.01:
return False

return True

Handling Gate Check Failures

When a check fails, you have three options:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def run_stage_with_validation(prompt, input_data, validate_fn, max_retries=3):
for attempt in range(max_retries):
output = call_llm(prompt.format(**input_data))

try:
validated = validate_fn(output)
return validated
except GateCheckError as e:
if attempt == max_retries - 1:
raise # Give up after max retries

# Retry with error feedback
input_data["previous_error"] = str(e)
prompt = prompt + """

Your previous response had this error: {previous_error}
Please correct and try again.
"""

LLM Feedback Loops

Gate checks catch obvious errors, but what about quality improvements? Feedback loops allow the system to iteratively refine outputs until they meet quality thresholds.

flowchart TD
    I[Input] --> G[Generate]
    G --> E[Evaluate]
    E --> D{Meets Criteria?}
    D -->|Yes| O[Output]
    D -->|No| F[Feedback]
    F --> G

    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff
    classDef greenClass fill:#27AE60,stroke:#333,stroke-width:2px,color:#fff

    class E orangeClass
    class D greenClass

Anatomy of a Feedback Loop

The core mechanism is prompt chaining where evaluation feedback is incorporated into the next generation attempt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def iterative_refinement(task, criteria, max_iterations=5):
current_output = None

for iteration in range(max_iterations):
if current_output is None:
# First attempt
output = generate(task)
else:
# Refinement attempt
output = generate(task, previous=current_output, feedback=feedback)

# Evaluate against criteria
evaluation = evaluate(output, criteria)

if evaluation.passes:
return output

# Prepare feedback for next iteration
feedback = evaluation.feedback
current_output = output

return current_output # Best effort after max iterations

Sources of Feedback

Self-Correction: The LLM evaluates its own output

1
2
3
4
5
6
7
8
9
10
11
self_review_prompt = """
Review this investment recommendation for a conservative client:
{recommendation}

Check for:
1. Risk level appropriateness
2. Diversification adequacy
3. Clear rationale

Rate 1-10 and provide specific feedback if rating < 8.
"""

External Tools: Objective validation from code execution

1
2
3
4
5
6
7
8
9
10
11
12
def code_feedback(generated_code: str) -> dict:
"""Run tests and return feedback"""
result = subprocess.run(
["pytest", "tests/"],
capture_output=True,
text=True
)
return {
"passed": result.returncode == 0,
"output": result.stdout,
"errors": result.stderr
}

Validation Checks: Programmatic verification

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def compliance_feedback(recommendation: dict) -> dict:
"""Check against compliance rules"""
issues = []

if recommendation["equity_percentage"] > 60:
issues.append("Equity allocation exceeds conservative threshold")

if "risk_disclosure" not in recommendation:
issues.append("Missing required risk disclosure statement")

return {
"approved": len(issues) == 0,
"issues": issues
}

Applied Example: Investment Advisory Pipeline

Let’s build a complete pipeline for generating and validating investment recommendations. This demonstrates both chaining and feedback loops.

The Pipeline Architecture

flowchart TD
    CP[Client Profile] --> IA[Investment Advisor Agent]
    IA --> REC[Initial Recommendation]
    REC --> CO[Compliance Officer Agent]
    CO --> D{Approved?}
    D -->|Yes| OUT[Final Recommendation]
    D -->|No| FB[Feedback]
    FB --> IA

    classDef blueClass fill:#4A90E2,stroke:#333,stroke-width:2px,color:#fff
    classDef orangeClass fill:#F39C12,stroke:#333,stroke-width:2px,color:#fff

    class IA blueClass
    class CO orangeClass

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
ADVISOR_PROMPT = """
You are a Certified Financial Planner creating investment recommendations.

Client Profile:
{client_profile}

{previous_feedback}

Create a diversified investment recommendation including:
1. Asset allocation percentages
2. Specific fund recommendations
3. Risk assessment
4. Rationale aligned with client goals
"""

COMPLIANCE_PROMPT = """
You are a Compliance Officer reviewing investment recommendations.

Client Profile:
{client_profile}

Recommendation to Review:
{recommendation}

Evaluate:
1. Risk appropriateness for client profile
2. Diversification adequacy
3. Regulatory compliance
4. Suitability for stated goals

Rate 1-10. If 8 or above, respond with "APPROVED FOR CLIENT".
Otherwise, provide specific feedback for revision.
"""

def investment_advisory_pipeline(client_profile: dict, max_iterations: int = 3):
recommendation = None
previous_feedback = ""

for iteration in range(max_iterations):
# Generate recommendation
advisor_response = call_llm(
ADVISOR_PROMPT.format(
client_profile=client_profile,
previous_feedback=previous_feedback
),
temperature=0.6 # Higher for creativity
)
recommendation = advisor_response

# Compliance review
compliance_response = call_llm(
COMPLIANCE_PROMPT.format(
client_profile=client_profile,
recommendation=recommendation
),
temperature=0.2 # Lower for consistency
)

if "APPROVED FOR CLIENT" in compliance_response:
return {
"recommendation": recommendation,
"compliance_review": compliance_response,
"iterations": iteration + 1
}

# Extract feedback for next iteration
previous_feedback = f"""
Previous recommendation was not approved.
Compliance feedback: {compliance_response}
Please revise addressing these concerns.
"""

return {
"recommendation": recommendation,
"status": "MAX_ITERATIONS_REACHED",
"final_feedback": compliance_response
}

Temperature Settings Matter

Notice the different temperature settings:

  • Investment Advisor (0.6): Higher temperature allows for creative, diverse recommendations
  • Compliance Officer (0.2): Lower temperature ensures consistent, deterministic reviews

This pattern is common in financial pipelines: creative stages benefit from variability while validation stages need reliability.

Monitoring and Debugging Pipelines

Building the pipeline is half the battle; knowing if it’s working correctly is the other half.

What to Monitor

Metric Purpose Example
Stage Success Rate Track where failures occur Stage 2 fails 15% of runs
Iteration Count Measure refinement efficiency Average 2.3 iterations to approval
Latency per Stage Identify bottlenecks Compliance check takes 4s average
Gate Check Failures Understand error patterns Format errors: 8%, Logic errors: 3%

Structured Logging

1
2
3
4
5
6
7
8
9
10
11
12
import logging
import json

def log_stage(stage_name: str, input_data: dict, output: str, validation_result: dict):
logging.info(json.dumps({
"stage": stage_name,
"timestamp": datetime.now().isoformat(),
"input_summary": summarize(input_data),
"output_length": len(output),
"validation_passed": validation_result.get("passed"),
"validation_errors": validation_result.get("errors", [])
}))

Common Pipeline Problems

Problem Symptom Solution
Infinite Loops Same error repeating Add max iteration limits, vary retry prompts
Cascading Errors Later stages always fail Tighter gate checks on earlier stages
Slow Convergence Many iterations to success Improve feedback specificity
Inconsistent Output Random failures Lower temperature, stricter format requirements

Best Practices for Financial Pipelines

1. Separate Configuration from Prompts

Evaluation criteria change with business rules. Keep them external:

1
2
3
4
5
6
7
8
9
10
11
12
13
# config/lending_rules.json
{
"credit_score_threshold": 650,
"max_debt_to_income": 0.43,
"risk_levels": ["LOW", "MEDIUM", "HIGH"]
}

# Load and inject into prompts
rules = load_config("lending_rules.json")
prompt = ASSESSMENT_PROMPT.format(
credit_threshold=rules["credit_score_threshold"],
dti_limit=rules["max_debt_to_income"]
)

2. Design for Failure Recovery

Every external call can fail. Plan for it:

1
2
3
4
5
6
7
8
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
def call_llm_with_retry(prompt: str) -> str:
return llm_client.complete(prompt)

3. Preserve Audit Trails

Financial decisions require documentation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def run_pipeline_with_audit(input_data: dict) -> dict:
audit_trail = []

for stage in pipeline_stages:
stage_result = run_stage(stage, input_data)

audit_trail.append({
"stage": stage.name,
"timestamp": datetime.now(),
"input": input_data,
"output": stage_result,
"validation": stage.validate(stage_result)
})

input_data = stage_result

return {
"result": input_data,
"audit_trail": audit_trail
}

4. Use Structured Outputs

Pydantic models ensure consistent data flow between stages:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class LoanApplication(BaseModel):
applicant_name: str
requested_amount: float = Field(gt=0)
annual_income: float = Field(gt=0)
credit_score: int = Field(ge=300, le=850)

class RiskAssessment(BaseModel):
application: LoanApplication
risk_level: Literal["LOW", "MEDIUM", "HIGH"]
debt_to_income: float
recommendation: str

class FinalDecision(BaseModel):
assessment: RiskAssessment
decision: Literal["APPROVE", "DECLINE", "REFER"]
conditions: list[str] = []
reasoning: str

Takeaways

  1. Prompt chaining connects individual prompts into multi-stage pipelines where each output feeds the next input

  2. Gate checks validate outputs between stages, catching errors before they cascade through the pipeline

  3. Feedback loops enable iterative refinement by incorporating evaluation results into subsequent generation attempts

  4. Different stages need different settings - creative stages benefit from higher temperature while validation stages need consistency

  5. Monitoring is essential - track success rates, iteration counts, and stage latencies to identify and fix problems

  6. Design for failure - use retries with exponential backoff and preserve audit trails for compliance


This is the third post in my Applied Agentic AI for Finance series. Next: Modeling Agentic Workflows for Finance where we’ll explore architectural patterns for complete financial agent workflows.

Reasoning Chains for Financial Decisions

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×