Your Data Scientists Are Shipping to Prod. Who Owns the Risk?

Blog

Your Data Scientists Are Shipping to Prod. Who Owns the Risk?

Rebecca Banks

May 29, 2026

Your Data Scientists Are Shipping to Prod. Who Owns the Risk?

Ashish Rajan, a CISO, spent last week in Phoenix speaking with more than 400 CISOs and security leaders at the Leadership Exchange. His write-up is worth reading. Themes of visibility, observability, and velocity came up across every conversation, which is nothing new for security. What is new is the pressure AI is putting on all three at the same time, and where that pressure is actually coming from.

Most of the conversation in rooms like that defaults to developer experiences. AI coding assistants, unvetted packages, dependencies entering the build pipeline faster than governance can catch up. That is a real problem. But there is a second problem forming underneath it that security teams are only starting to name.

The perimeter of who makes open source software decisions in your organization just expanded. And your governance model probably hasn’t.

Key Takeaways

The open source software governance model most organizations have was built around one assumption: developers make package decisions. AI tools have broken that assumption.
Data scientists, ML engineers, and technical operators are self-deploying environments and automation scripts that reach production data and systems, often outside any engineering review process. The OSS components they introduce are ungoverned and unowned.
Package governance enforced at the point of ingestion, defined by a catalog of pre-vetted open source components, is the only model that holds regardless of who is making the package request.

A second intake problem is forming outside engineering

Data scientists and ML engineers operate at the boundary of engineering and the business. They have the technical capability to write Python, pull packages, and deploy scripts and scheduled jobs that run against production data. They do not always operate inside the engineering team’s toolchain, code review process, or dependency governance workflow. They are, in the most direct sense, making open source software decisions that your security infrastructure may not be positioned to see.

A data scientist automating a model pipeline pulls in a handful of Python packages to handle data ingestion, transformation, and output. That script runs on a schedule. It touches production data. The OSS components it introduced were never reviewed, never added to a software bill of materials, and have no assigned owner for remediation when a CVE drops against one of them. Vulnerabilities within those dependencies arise just as they do against anything in your official build pipeline.

The same pattern holds for ML engineers standing up inference environments, technical operators automating vendor or internal data workflows, and analysts building tooling that starts as a one-off and ends up running in production indefinitely. These are not shadow IT in the dismissive sense. These are skilled technical people doing real work.

Where visibility, observability, and velocity all break at once

Take Rajan’s three-word framework and apply it to this expanded intake surface.

Visibility. You cannot see what OSS components entered your environment through a data scientist’s pip install if that install never touched your developer toolchain. You have no inventory of what was introduced, no record of when, and no owner assigned to remediate when a CVE surfaces against one of those components.

Observability. You cannot observe whether the dependency posture of that model pipeline changed last week, because you were never observing it to begin with.

Velocity. By the time your security team discovers the exposure, the script has been running in production long enough to create real risk.

This is the structural limit of scan-and-pray policies: they were designed for a world where engineers made dependency decisions through controlled pipelines. The expansion of technical capability beyond the engineering org extends it in ways a reactive scanning posture cannot absorb.

Governance enforced at ingestion is the only model that holds

If the intake point for open source components is no longer exclusively controlled by your engineering team, your governance model has to move upstream of the intake point itself.

Scan-and-pray works, imperfectly, when engineers make dependency decisions through known pipelines and a scanner positioned downstream catches what they miss. It’s incomplete when the population of people making those decisions includes technical roles operating outside your engineering toolchain.

The alternative is governance enforced at the point of ingestion. A curated catalog of open source components built from verified source code, scored for real-world risk across the full dependency tree, and delivered with signed attestations and a complete software bill of materials. When a data scientist runs pip install to stand up a model pipeline, that package resolves from the governed catalog rather than from the public registry. This requires the environment to be configured to route pip requests to the catalog instead of PyPI; the same private registry enforcement your engineering team already applies to production builds.

No finding gets generated for a dependency that never entered the environment ungoverned. No CVE lands in a remediation backlog for a component that never cleared your security threshold. No board conversation starts with explaining why a vulnerability sat unaddressed for 54 days² in a component that entered through a workflow your security team had no visibility into.

The scope of the problem just changed

Your engineering team is not your only OSS intake point anymore. If someone in your organization has the technical capability to pull a package and deploy something that touches production, they are making open source software decisions your security team may not be able to see, observe, or respond to in time.

The CISOs in Phoenix were right to keep returning to visibility, observability, and velocity. Those are the right frames. The question worth adding to that conversation: are you applying them to the full population of technical contributors in your organization, or only to the engineers?

If that is a conversation worth having inside your organization, we are glad to be part of it.

Talk to ActiveState

Frequently Asked Questions

Why isn't my existing scanner enough to catch OSS risk outside the engineering pipeline?

Scanners catch what enters your environment through pathways they are pointed at. When a data scientist runs pip install outside your developer toolchain, you may get no finding at all if that install path was never in scope. Where a finding does surface, it arrives after the dependency has already entered the environment, and remediation lands in a backlog that may have no clear owner.

What does "governance at the point of ingestion" mean in practice?

It means that policy is enforced before a package enters your environment, not after. A curated catalog of pre-vetted open source components sits between your users and the public registries. When a data scientist pulls a package for a model pipeline, the resolved package comes from the governed catalog rather than from the public registry. The security threshold is cleared before the component is introduced, so no finding is generated, no remediation ticket is created, and no liability clock starts.

How does the NIST NVD enrichment gap affect this?

NIST formally acknowledged in April 2026 that it can no longer enrich all CVEs at current submission volumes. CVEs outside their priority criteria now surface with no severity score, no impact assessment, and no product mapping. For organizations managing an expanded OSS intake surface, this compounds the problem: you are trying to prioritize remediation for components you may not have full visibility into, using enrichment data that is increasingly incomplete. A governed catalog model reduces the volume of ungoverned components entering the environment in the first place, which reduces the triage burden regardless of the state of NVD enrichment.

Does this only apply to large engineering organizations?

No. The pattern of technical roles outside the core engineering team making OSS decisions is common in any organization with a data function, an ML initiative, or technical operators building tooling. The governance gap scales with the number of people who have the capability to pull a package, not with the size of the engineering org.

‍