HTTP Request Tool - Technical Overview¶

Architecture¶

The HTTP Request tool integrates seamlessly with InfluencerPy's agent system:

┌─────────────────────────────────────────────────────────────┐
│                         User/CLI                            │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│                    Scout Manager                            │
│  - Orchestrates scout execution                             │
│  - Manages tool configuration                               │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│                      AI Agent                               │
│  - Powered by Gemini/Anthropic                              │
│  - Equipped with selected tools                             │
└───────────────────────┬─────────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┬───────────────┐
        ▼               ▼               ▼               ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│   Google    │ │   HTTP      │ │   Reddit    │ │   ArXiv     │
│   Search    │ │   Request   │ │    Tool     │ │    Tool     │
└─────────────┘ └──────┬──────┘ └─────────────┘ └─────────────┘
                       │
                       ▼
            ┌──────────────────────┐
            │   Beautiful Soup     │
            │   - HTML Parsing     │
            │   - CSS Selectors    │
            │   - Text Extraction  │
            └──────────────────────┘

Data Flow¶

1. Tool Invocation¶

# Agent calls the tool
result = http_request(
    url="https://example.com/article",
    selector="article"
)

2. Request Processing¶

Request → Headers → HTTP GET → Response → Parse HTML → Extract Text → Clean → Return

3. Response Handling¶

# Agent receives structured data
{
    "url": "https://example.com/article",
    "title": "Article Title",
    "content": "Clean extracted text...",
    "links": [...]  # If requested
}

Integration Points¶

1. Tool Registration¶

Located in: src/influencerpy/core/scouts.py

# Import the tool
from influencerpy.tools.http_tool import http_request

# Add to agent tools list
if "http_request" in tools_config:
    agent_tools.append(http_request)

2. Prompt Configuration¶

Located in: src/influencerpy/types/prompts.py

TOOL_INSTRUCTIONS = {
    "http_request": """TOOL: http_request
Use this to fetch and read content from any web URL.
..."""
}

3. Scout Configuration¶

User-facing configuration:

config = {
    "tools": ["http_request"],  # Enable the tool
    "orchestration_prompt": "..."
}

Technical Implementation¶

Core Function Signature¶

@tool
def http_request(
    url: str, 
    selector: str = None, 
    extract_links: bool = False
) -> Dict[str, str]:
    """Fetch and parse web content."""

Key Features¶

1. User Agent Spoofing¶

headers = {
    'User-Agent': 'Mozilla/5.0 ...'
}

Prevents blocking by websites that reject bot requests.

2. Timeout Protection¶

response = requests.get(url, timeout=10)

Prevents hanging on slow/unresponsive servers.

3. Content Cleaning¶

# Remove scripts and styles
for script in soup(["script", "style"]):
    script.decompose()

# Extract clean text
content = soup.get_text(separator=' ', strip=True)

4. Content Truncation¶

max_length = 10000
if len(content) > max_length:
    content = content[:max_length] + "..."

Prevents overwhelming the AI model with too much text.

5. CSS Selector Support¶

if selector:
    elements = soup.select(selector)
    content = "\n\n".join(elem.get_text() for elem in elements)

6. Link Extraction¶

if extract_links:
    for link in soup.find_all('a', href=True):
        href = urljoin(url, link['href'])  # Make absolute
        links.append({"text": link_text, "url": href})

Error Handling Strategy¶

try:
    # Request logic
except requests.exceptions.Timeout:
    return {"url": url, "error": "Timeout"}
except requests.exceptions.RequestException as e:
    return {"url": url, "error": str(e)}
except Exception as e:
    return {"url": url, "error": f"Parsing error: {e}"}

Performance Characteristics¶

Typical Response Times¶

Simple page: 0.5-2 seconds
Complex page: 2-5 seconds
Timeout: 10 seconds (then error)

Resource Usage¶

Memory: ~10-50 MB per request
CPU: Low (parsing is fast)
Network: Depends on page size

Limitations¶

Aspect	Limit	Reason
Content length	10,000 chars	Prevent model overload
Links	50 links	Prevent excessive data
Timeout	10 seconds	Prevent hanging
JavaScript	Not supported	Use browser tool instead

Testing Strategy¶

Unit Tests¶

# Mock HTTP responses
with patch("requests.get", return_value=mock_response):
    result = http_request(url="...")
    assert result["content"] == expected

Integration Tests¶

# Verify Strands compatibility
assert hasattr(http_request, 'tool_spec')
assert http_request.tool_spec['name'] == 'http_request'

Manual Testing¶

# Run the demo
python examples/http_tool_demo.py

Security Considerations¶

1. URL Validation¶

The tool trusts the AI agent to provide valid URLs. In production: - Consider URL whitelist/blacklist - Validate URL schemes (http/https only) - Block internal IPs/localhost

2. Content Safety¶

The tool extracts text only (no script execution)
XSS is not a concern (no rendering)
Content is sanitized by text extraction

3. Rate Limiting¶

Consider adding: - Per-domain rate limits - Request caching - Backoff on errors

Future Enhancements¶

Phase 1: Stability¶

[x] Basic implementation
[x] Error handling
[x] Unit tests
[ ] Rate limiting per domain
[ ] Request caching

Phase 2: Features¶

[ ] Custom headers support
[ ] Cookie/session handling
[ ] Retry logic with backoff
[ ] Robots.txt checking

Phase 3: Advanced¶

[ ] JavaScript rendering (Playwright)
[ ] Screenshot capture
[ ] PDF extraction
[ ] Form submission

Comparison with Browser Tool¶

When to Use HTTP Request¶

✅ Static content only
✅ Speed is important
✅ Simple extraction
✅ Reliable execution needed

When to Use Browser Tool¶

✅ JavaScript required
✅ Complex interactions
✅ Form submissions
✅ Screenshot needed

Dependencies¶

Required Packages¶

[project]
dependencies = [
    "beautifulsoup4",  # HTML parsing
    "requests",        # HTTP client
    "strands-agents",  # Tool decoration
]

All dependencies are already in the project - no new installations needed!

Code Quality¶

Type Hints¶

def http_request(url: str, selector: str = None, extract_links: bool = False) -> Dict[str, str]:

Documentation¶

✅ Comprehensive docstrings
✅ Inline comments
✅ Usage examples
✅ User guide

Testing¶

✅ Unit tests with mocking
✅ Integration tests
✅ Demo script
✅ Error case coverage

Summary¶

The HTTP Request tool is: - Fast: No browser overhead - Reliable: Comprehensive error handling - Flexible: CSS selectors for precise extraction - Well-tested: Unit and integration tests - Well-documented: Multiple documentation files - Easy to use: Simple API, clear examples

Perfect for most web scraping needs in InfluencerPy! 🎯