[Medium] GHSA-3643-7v76-5cj2 PraisonAI knowledge-store backends interpolate unvalidated collection names into SQL and CQL queries

Summary

PraisonAI exposes optional SQL/CQL-backed knowledge-store implementations that build table and index identifiers from unvalidated name and collection arguments. Applications that pass untrusted collection names into these backends can trigger SQL or CQL injection.

Details

This issue affects the public persistence layer exported by persistence/init.py, which exposes KnowledgeStore and create_knowledge_store(). The factory wires the affected backends as supported knowledge-store providers in [persistence/factory.py](/Users/shmulc/Stuff/tmp/first-cve/scans/variant-hunt/PraisonAI/src/praisonai/praisonai/persistence/factory.py:112):

pgvector at [persistence/factory.py](/Users/shmulc/Stuff/tmp/first-cve/scans/variant-hunt/PraisonAI/src/praisonai/praisonai/persistence/factory.py:162)
cassandra at persistence/factory.py
singlestore_vector at persistence/factory.py

The common root cause is that the KnowledgeStore interface accepts free-form collection names in create_collection(), delete_collection(), insert(), upsert(), search(), get(), delete(), and count() at [persistence/knowledge/base.py](/Users/shmulc/Stuff/tmp/first-cve/scans/variant-hunt/PraisonAI/src/praisonai/praisonai/persistence/knowledge/base.py:44), but the affected backends interpolate those values directly into query text instead of validating or quoting them.

Representative sinks:

SingleStoreVectorKnowledgeStore builds table_name = f"{self.table_prefix}{name}" and executes raw DDL in [persistence/knowledge/singlestore_vector.py](/Users/shmulc/Stuff/tmp/first-cve/scans/variant-hunt/PraisonAI/src/praisonai/praisonai/persistence/knowledge/singlestore_vector.py:92). The same pattern is reused for delete_collection, insert, upsert, search, get, delete, and count.
PGVectorKnowledgeStore builds public.praison_vec_{collection} and idx_{name}_embedding directly into SQL in [persistence/knowledge/pgvector.py](/Users/shmulc/Stuff/tmp/first-cve/scans/variant-hunt/PraisonAI/src/praisonai/praisonai/persistence/knowledge/pgvector.py:82).
CassandraKnowledgeStore interpolates name and collection directly into CREATE TABLE, DROP TABLE, INSERT, SELECT, DELETE, and COUNT statements in [persistence/knowledge/cassandra.py](/Users/shmulc/Stuff/tmp/first-cve/scans/variant-hunt/PraisonAI/src/praisonai/praisonai/persistence/knowledge/cassandra.py:73).

There is already an internal identifier validator in the conversation persistence layer:

validate_identifier() only allows alphanumeric characters and underscores in [persistence/conversation/base.py](/Users/shmulc/Stuff/tmp/first-cve/scans/variant-hunt/PraisonAI/src/praisonai/praisonai/persistence/conversation/base.py:18)

That validator is used for SQL identifiers such as table_prefix and schema in the conversation stores, but no equivalent validation is applied in the affected knowledge-store backends.

Version scope:

pgvector.py and cassandra.py were already present by v2.4.1
singlestore_vector.py was present by v2.4.3
the current PyPI release on May 1, 2026 is 4.6.33, and the same interpolation patterns are still present

Scope note for maintainers: I did not identify a built-in PraisonAI HTTP endpoint that forwards external request data into these specific persistence methods. The issue is in the package's public persistence APIs and affects applications that pass untrusted collection names to the affected backends.

PoC

The following local reproductions show that attacker-controlled collection names become part of the executed SQL text.

Reproduce the SingleStoreVectorKnowledgeStore.delete_collection() query construction:

python3 -

PraisonAI knowledge-store backends interpolate unvalidated collection names into SQL and CQL queries

Summary

Details

PoC

Affected AI Products