Discussion about this post

User's avatar
GB's avatar

Okay, help me understand this. You write: "capability, continuity, and what we might call behavioral identity aren't purely intrinsic to the weights. They're relational artifacts of the scaffolding." You cite the ClawEnvKit paper as empirical grounding.

I went back to the source. The 15.7 percentage point figure is up to, i.e. maximum improvement of engineered harness over a bare ReAct baseline. Not harness-to-harness variation. And the paper measures task completion, not identity or continuity.

So the empirical finding is that scaffolding affects performance, which is uncontroversial. The move from there to behavioral identity is relational artifact of scaffolding seems like it needs separate justification. Performance and identity are different categories.

Am I reading the paper differently than you intended, or is the philosophical claim resting on considerations the piece doesn't fully spell out?

1 more comment...

No posts

Ready for more?