Cybersecurity researchers have identified a chain of serious remote code execution vulnerabilities affecting multiple high-profile AI inference frameworks from Meta, Nvidia, Microsoft, and major open-source projects. The findings reveal that insecure code patterns were copied across repositories, creating a systemic threat across the broader AI ecosystem.
A Vulnerability Spreading Through Code Reuse
According to security firm Oligo, the weaknesses share a common root. Developers reused code containing unsafe ZeroMQ and Python pickle operations, inadvertently replicating the same exploitable flaw across several frameworks. The issue first appeared in Meta’s Llama Stack before spreading into Nvidia TensorRT-LLM, vLLM, SGLang, and the Modular Max Server.
The vulnerable pattern relied on ZeroMQ’s recv_pyobj() function to receive objects before immediately passing the input into Python’s pickle.loads(). Since pickle can execute arbitrary code during deserialization, any unauthenticated ZeroMQ socket exposed to the network effectively became a remote execution vector.
How the Security Issue Propagated Across the AI Ecosystem
Oligo researchers noted numerous instances where files were copied nearly line-for-line between different projects. Some even carried comments such as “Adapted from vLLM,” indicating that the insecure design was transferred without deeper security review.
Oligo refers to this widespread vulnerability pattern as “ShadowMQ,” highlighting how flaws in communication modules silently replicate across repositories. Because AI frameworks increasingly serve as foundational layers for enterprise deployments, contaminated code can have far-reaching consequences.
Patches Issued Across Major Frameworks
The flaw was initially reported to Meta in September 2024 and assigned CVE-2024-50050. Meta responded by replacing unsafe pickle-based operations with safer JSON serialization methods. This triggered a broader audit across the industry, uncovering similar vulnerabilities in:
• vLLM (CVE-2025-30165)
• Nvidia TensorRT-LLM (CVE-2025-23254)
• Modular Max Server (CVE-2025-60455)
Updated versions have since been released, and vendors have encouraged developers to upgrade immediately. Many organizations were found running inference servers with open ZeroMQ endpoints exposed on the public internet, increasing the likelihood of exploitation.
Why the Vulnerability Poses a Major Threat to AI Infrastructure
The affected inference servers process model weights, confidential user prompts, and sensitive workloads. If exploited, attackers could execute arbitrary code on GPU clusters, elevate privileges, exfiltrate proprietary models, implant persistent malware, or deploy GPU miners. The attack path could compromise both cloud and on-premise deployments.
The risk is heightened by the widespread use of frameworks such as SGLang, which has been adopted by leading players including xAI, AMD, Intel, Nvidia, Oracle Cloud, LinkedIn, and Google Cloud.
Warnings and Recommendations from Security Experts
Oligo emphasized that the vulnerability reflects deeper structural challenges in the AI software supply chain. Rapid development cycles, dependency reuse, and a rush to build high-performance inference servers have left critical gaps in security assurance.
Developers are urged to upgrade to patched releases, including Meta Llama Stack v0.0.41 or later, Nvidia TensorRT-LLM 0.18.2, vLLM v0.8.0, and Modular Max Server v25.6. Additional best practices include restricting pickle usage, enforcing HMAC and TLS authentication in ZeroMQ channels, and training development teams to identify unsafe serialization patterns.
A Wake-Up Call for AI Security
The ShadowMQ incident underscores how quickly security flaws can propagate through the AI landscape when widely copied design patterns go unreviewed. As enterprise AI adoption accelerates, the industry faces a growing need for rigorous security standards to prevent small mistakes from turning into ecosystem-wide vulnerabilities.


