HTTP Request Tool¶
The HTTP Request tool allows all agents to fetch and parse web content using Beautiful Soup. This tool is perfect for reading articles, blog posts, documentation, and any web content to create social media posts.
Features¶
- Fetch any web page: Read the full content from any URL
- CSS selectors: Target specific elements using CSS selectors
- Link extraction: Optionally extract all links from a page
- Clean text extraction: Automatically removes scripts, styles, and excess whitespace
- Smart truncation: Limits content to prevent overwhelming the AI model
Usage¶
Basic URL Fetch¶
Fetch and read the entire content of a web page:
from influencerpy.tools.http_tool import http_request
result = http_request(url="https://example.com/article")
print(result["content"]) # Full page text
print(result["title"]) # Page title
Using CSS Selectors¶
Extract specific content using CSS selectors:
# Extract just the article content
result = http_request(
url="https://example.com/blog/post",
selector="article"
)
# Extract specific class
result = http_request(
url="https://example.com/page",
selector=".main-content"
)
# Extract by ID
result = http_request(
url="https://example.com/docs",
selector="#documentation"
)
Extracting Links¶
Get all links from a page:
result = http_request(
url="https://example.com",
extract_links=True
)
for link in result["links"]:
print(f"{link['text']}: {link['url']}")
Integration with Scouts¶
The HTTP Request tool is available to all agents through the tools configuration. To enable it for a scout:
When Creating a Scout¶
from influencerpy.core.scouts import ScoutManager
manager = ScoutManager()
scout = manager.create_scout(
name="Tech Blog Monitor",
type="meta",
config={
"tools": ["http_request"], # Enable the tool
"orchestration_prompt": "Monitor tech blogs and find interesting articles"
},
platforms=["x"]
)
Example: Blog Post Scout¶
Create a scout that monitors specific blog posts:
scout = manager.create_scout(
name="ML Blog Watcher",
type="meta",
config={
"tools": ["http_request"],
"orchestration_prompt": """
Read the latest posts from machine learning blogs.
Use http_request to fetch article content and summarize key insights.
"""
},
prompt_template="Summarize technical articles with key takeaways and practical applications.",
platforms=["x", "linkedin"]
)
Example: Link Aggregator Scout¶
Create a scout that finds and analyzes links:
scout = manager.create_scout(
name="Resource Curator",
type="meta",
config={
"tools": ["http_request"],
"orchestration_prompt": """
Find useful resources from curated lists.
Use http_request with extract_links=True to find related content.
"""
}
)
Return Format¶
The tool returns a dictionary with the following structure:
{
"url": str, # The requested URL
"title": str, # Page title (if available)
"content": str, # Extracted text content
"links": [ # List of links (if extract_links=True)
{
"text": str, # Link text
"url": str # Link URL (absolute)
}
],
"error": str # Error message (if failed)
}
Error Handling¶
The tool gracefully handles errors and returns them in the response:
result = http_request(url="https://invalid-url.example")
if "error" in result:
print(f"Failed: {result['error']}")
Common errors: - Request timeout (10 seconds) - Network connection issues - Invalid URLs - Parsing errors
Best Practices¶
- Use CSS selectors when you know the page structure to get cleaner content
- Check for errors in the response before processing content
- Combine with other tools like
google_searchto find URLs first - Respect rate limits - don't hammer the same domain repeatedly
- Content length - The tool automatically truncates content over 10,000 characters
Example Workflow¶
A typical workflow combining multiple tools:
- Use
google_searchto find interesting articles - Use
http_requestto read the full content - Use the LLM to generate a social media post based on the content
# This is what the agent does internally:
# 1. Search for content
search_results = google_search("machine learning breakthroughs 2026")
# 2. Extract URL from results
url = extract_first_url(search_results)
# 3. Fetch full content
article = http_request(url=url, selector="article")
# 4. Generate post (done by the agent automatically)
Limitations¶
- Timeout: Requests timeout after 10 seconds
- Content length: Truncated at 10,000 characters
- Link limit: Maximum 50 links when extract_links=True
- JavaScript: Cannot execute JavaScript (use browser tool for that)
- Authentication: Cannot handle login/authenticated pages
Comparison with Browser Tool¶
| Feature | http_request | browser |
|---|---|---|
| Speed | Fast | Slower |
| JavaScript | No | Yes |
| CSS Selectors | Yes | Yes |
| Multiple steps | No | Yes |
| Stability | Stable | Experimental |
Use http_request for simple content fetching and the browser tool for complex interactions requiring JavaScript.