Skip to content

feat: taint checking and security#197

Open
ambrishrawat wants to merge 45 commits intogenerative-computing:mainfrom
ambrishrawat:security_poc
Open

feat: taint checking and security#197
ambrishrawat wants to merge 45 commits intogenerative-computing:mainfrom
ambrishrawat:security_poc

Conversation

@ambrishrawat
Copy link

This PR introduces a minimal proof-of-concept for taint and security propagation across CBlock, ModelOutputThunk, and session flows, as discussed in generative-computing/mellea#189
.

@ambrishrawat ambrishrawat marked this pull request as draft October 14, 2025 19:21
@mergify
Copy link

mergify bot commented Oct 14, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@nrfulton nrfulton self-requested a review October 15, 2025 16:54
@ambrishrawat
Copy link
Author

@nrfulton quick clarifications -

  1. What’s the best way for expose taint configuration to devs? e.g. when a description includes a user variable like summarise the following {{email_body}}, should taint be inferred automatically or something they can configure?
  2. Would it make sense to have a global strictness setting to toggle between warnings and exceptions for taint violations? Is blocify the best place for this?

@nrfulton
Copy link
Member

What’s the best way for expose taint configuration to devs? e.g. when a description includes a user variable like summarise the following {{email_body}}, should taint be inferred automatically or something they can configure?

We should infer automatically where-ever possible. I nthis case, I'm not sure how you would infer taint. I guess you assumption here is that email_boy -- or any user_variable input -- should entail taint?

@ambrishrawat
Copy link
Author

Yes, that was the thinking. Making it configurable may make more sense for taint. Any thoughts on the best way to expose that? Would love your take on the code too.

@davidcox
Copy link

If there is a tainted variable in the context, everything downstream should get tainted. As for how variables get tainted in the first place, a common way people do this is to define sources, sinks, and (optionally) washers. These are wrappers around interfaces that produce sensitive data (e.g. HR database api), or where it enters an unsafe place (e.g. sending to a UI).

@ambrishrawat ambrishrawat marked this pull request as ready for review November 12, 2025 11:11
Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
Comment on lines 79 to 80
component = CBlock("user input")
component.mark_tainted() # Sets SecLevel.tainted_by(component)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. CBlocks should be immutable.
  2. Naming a variable Ccmponent and assigning it to a CBlock is confusing.
  3. The cyclic reference here is a bit confusing and invites buggy code. Use tainted_by(None) instead of tainted_by(self) for the root node.
c = CBlock("user input", sec_level=SecLevel.tained_by(None))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this; tainted_by(None) for root now

component = CBlock("user input")
component.mark_tainted() # Sets SecLevel.tainted_by(component)

if component._meta["_security"].is_tainted():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not c.sec_level?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined it as a property and now this works as c.sec_level.is_tainted()

print(f"Original CBlock is tainted: {not tainted_desc.is_safe()}")

# Create session
session = MelleaSession(OllamaModelBackend("llama3.2"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the example critically depends on using a particular model, always use session = start_session() instead. This makes the examples easier to maintain.


# The result should be tainted
print(f"Result is tainted: {not result.is_safe()}")
if not result.is_safe():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use is_tainted instead of is_safe. The meaning of safe is very ambiguous.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed all instances of is_safe

Returns:
The CBlock or Component that tainted this content, or None
"""
if self.level_type == "tainted_by":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially in a module called security.core, we should avoid use of magic strings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created SecLevelType enum

sources.append(action)

# For Components, check their constituent parts for taint
if hasattr(action, 'parts'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead us something like:

match action:
     case Component...

    case CBlock...

(If type(action) :> Component then check is not necessary because the Component protocol has a parts() method. )

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it to use match/case

Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
@ambrishrawat
Copy link
Author

Thanks for the review @nrfulton !
I have incorporated your suggestions. Appreciate another pass when you get the chance

Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
@guicho271828
Copy link
Contributor

hi ambrish!

@nrfulton nrfulton self-requested a review December 2, 2025 02:15
@mergify
Copy link

mergify bot commented Jan 14, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

Signed-off-by: Ambrish Rawat <ambrish.rawat@ie.ibm.com>
@mergify
Copy link

mergify bot commented Jan 16, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@mergify
Copy link

mergify bot commented Jan 18, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@mergify
Copy link

mergify bot commented Jan 22, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@mergify
Copy link

mergify bot commented Jan 23, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@mergify
Copy link

mergify bot commented Jan 26, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@mergify
Copy link

mergify bot commented Jan 28, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@mergify
Copy link

mergify bot commented Feb 1, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@mergify
Copy link

mergify bot commented Feb 8, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@mergify
Copy link

mergify bot commented Feb 12, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

def wrapper(*args, **kwargs):
# Check all arguments for marked content (tainted or classified)
for arg in args:
if isinstance(arg, TaintChecking):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's an issue with here in not looking recursively (ie via taint_sources() )? the args may be ok, but we could have a tainted Component?

Copy link
Author

@ambrishrawat ambrishrawat Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see that, yes need to call taint_sources here and do a recursive check for classified sources as well. I will update this.

@mergify
Copy link

mergify bot commented Feb 25, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:



def declassify(cblock: "CBlock") -> "CBlock":
"""Create a declassified version of a CBlock (non-mutating).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only CBlocks or any subclass? If the latter, we return a CBlock which would lose the subclass info?
For example could it be an ImageBlock?
Also note that ImageBlock never calls super().init() which means the value that is being copied does not exist == crash?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this particular declassify is merely illustrative, hence only applicable to CBlocks. One would need bespoke declassification methods for applications to sparsify taint information across the code. That was my initial thinking

@ambrishrawat ambrishrawat requested review from a team and jakelorocco as code owners February 25, 2026 18:12
@mergify
Copy link

mergify bot commented Feb 27, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants