local-deep-research is Vulnerable to HTML Injection via Unescaped User Input in PDF Export (`pdf_service.py:_markdown_to_html`)
Summary
PDFService._markdown_to_html() constructs an HTML document by interpolating user-controlled values — specifically title (sourced from research.title or research.query) and metadata key-value pairs — directly into an f-string without any HTML escaping. An authenticated attacker can craft a research query containing HTML special characters to inject arbitrary HTML tags into the document processed by WeasyPrint during PDF export. This injection can be chained to trigger a Server-Side Request Forgery (SSRF), bypassing the application's existing SSRF defenses in ssrf_validator.py.
Details
Vulnerable code: src/local_deep_research/web/services/pdf_service.py, lines 171–176
# pdf_service.py:171-176
if title:
html_parts.append(f"{title}") # ← title is not escaped
if metadata:
for key, value in metadata.items():
html_parts.append(f'') # ← key/value are not escaped
Data flow trace:
User input: research.query
│
▼
research_routes.py:1321
pdf_title = research.title or research.query
│
▼
research_routes.py:1325-1326
export_report_to_memory(report_content, format, title=pdf_title)
│
▼
pdf_service.py:107
PDFService.markdown_to_pdf(markdown_content, title=pdf_title)
│
▼
pdf_service.py:137
_markdown_to_html(markdown_content, title, metadata)
│
▼
pdf_service.py:172
f"{title}" ← injection point, no escaping
│
▼
pdf_service.py:112
HTML(string=html_content) ← WeasyPrint renders the injected HTML
research.query is a string submitted by the user via POST /api/start_research, stored as-is in the database, and retrieved without any sanitization. When the user triggers POST /api/v1/research//export/pdf, this value is embedded unescaped into the HTML document processed by WeasyPrint.
Injection point 1: `` tag breakout
Input:
Rendered:
When WeasyPrint encounters the injected `` tag, it issues an HTTP GET request to the value of src by default.
Injection point 2: `` attribute breakout
Input: " />