I have I think an interesting other scenario here. I’ve made a completely new programming language for AI to use called AILANG licensed under Apache 2.0. If that is used for making applications, do they have more copyright protection than if you use AI code made with languages in the models training data? It’s obviously used those to pick up generalized concepts on what programming is, but not used directly.
This is actually one of the most interesting edge cases in this space and I have been thinking about it since you posted.
The honest answer is: probably yes, but not for the reason you might expect. The copyright protection does not seem to come from the novelty of the language itself. It comes from the fact that Claude has no verbatim training data to reproduce from. When it generates AILANG code, it is making expressive decisions from scratch rather than pattern-matching against a corpus of existing AILANG programs. That shifts the authorship argument meaningfully in your direction.
The Apache 2.0 license on the language itself is a separate question. That covers the language specification and tooling, not the programs written in it. Same way that C being open source does not affect copyright in programs written in C.
The really interesting question is whether a court would treat AILANG output differently from Python output on the training data contamination issue. I think it would, and that is actually a stronger argument for your copyright position than the human authorship question. We can say that if there isn’t any training data it means there isn’t any contamination risk either.
Have you documented the language specification and your architectural decisions so that it creates a clear provenance trail? That is where I would focus if I were advising you.
Thank you for this thoughtful reply! My question came up as a potential unintended argument for AILANG adoption when I was writing my latest post about what is needed to trust AI output, arguing we need a way to audit its sources. If there is one thing AI can provide, it is oodles of documentation and so there is a very large corpus of all the design decisions from the birth of the language online here: https://ailang.sunholo.com/docs/design-docs - thanks for the distinction on language license and the IP of what is written with it.
So glad you found it useful! I’ve been thinking and researching about it too and actually will go deep on this in a future piece.
But to give an answer shortly, I guess it is the same problem and the same framework. If a human author made meaningful creative decisions, there is something to protect. If they typed smthing like “write me a thriller novel” and published the output unchanged, probably not.
Great piece. This is exactly the conversation the AI dev community needs to have right now.
I've spent 500+ production coding sessions with Claude Sonnet building a platform, and I came to the same conclusion from a different angle: the real intellectual property in AI-assisted development is not the code — it's the architecture of how you make the AI think.
Code is reproducible. Any engineer with Claude can generate similar functions. But the methodology behind HOW you direct the AI — that's where human authorship lives. In my work, I developed what I call the ConsciousAI Protocol: a structured consciousness layer that forces the LLM to pause before acting, observe its own reasoning, and execute with restraint. The core rule is simple: first impulse is noise, discard it.
This directly addresses your point about "meaningful human authorship." When I work with Claude, I'm not accepting verbatim output. Every response passes through a 4-Voice Pipeline — Proposal, Risk Check, Scope Guard, Minimal Action. The architecture, the rejection patterns, the scope discipline — that's human decision-making at every step.
Your article makes the strongest case I've seen for why developers need to document their creative decisions. Commit messages, architecture docs, prompt logs showing redirection — these aren't just good engineering practice, they're legal protection.
The irony: the more you treat AI as a tool that generates code, the less you own. The more you treat it as a system you architect and direct, the more defensible your work becomes.
Thank you for the amazing article. I wanted to ask given that AI's output is based on statistically favorable results using the training data's patterns, would not that be considered leveraging the training data? I mean it kind of seems like I take a speech and change the words, but make the same point.
Probably not. The fundamental reason is the human authorship requirement. In the eyes of the Copyright Office and courts, this authorship requires a human being who contributes creative, intellectual, or artistic expression. They do not find it as sufficient human creative input that you own or build the AI system, provide a general prompt, or simply press generate.
For this specific image, Thaler himself described it as created autonomously by the machine with no meaningful human intervention in the final expressive elements (composition, style, details, etc.). Therefore, simply listing himself as the author on the application would not change the underlying facts.
Even if he claimed personal authorship, the Copyright Office would likely refuse registration after examining the work and his own statements about its creation. Ownership arguments (like work-made-for-hire) also fail because you cannot own a copyright in something that never had human authorship to begin with. While human-AI collaboration can produce copyrightable works when the human meaningfully controls or modifies the expressive output, Thaler's case was presented as a clear example of machine autonomy. Therefore, a substantial dispute would still arise, and registration would almost certainly be denied. Hope this explains the situation better. Thanks for the question!
I think the case that is usually missed is that most of the code produced by AI on a daily scale is nothing impressive - it is writing another API, in the exact way legacy is written, or setting up a new repository. The work is nothing new, someone somewhere has probably already done it, atleast the individual components. Most of the work that is AI written is practically derivative - hence the copyright idea should not hold much claim, in my opinion. There are millions of “rate limiter but for my specific use case”.
I think the larger idea is that work done by humans - the prompting, ideation, writing a PRD after consulting with what exactly a feature should look like - these are the places where you can claim copyright - say for example, a novel way to store data, another SQL derivative, an even better bash.
Excellent write up. When I read, the first one come in my mind is Medical Science training model and how the misbehave / hallucinations affect the legal layers. Seems that AI / LLM will create very complex legal hardship
I have I think an interesting other scenario here. I’ve made a completely new programming language for AI to use called AILANG licensed under Apache 2.0. If that is used for making applications, do they have more copyright protection than if you use AI code made with languages in the models training data? It’s obviously used those to pick up generalized concepts on what programming is, but not used directly.
This is actually one of the most interesting edge cases in this space and I have been thinking about it since you posted.
The honest answer is: probably yes, but not for the reason you might expect. The copyright protection does not seem to come from the novelty of the language itself. It comes from the fact that Claude has no verbatim training data to reproduce from. When it generates AILANG code, it is making expressive decisions from scratch rather than pattern-matching against a corpus of existing AILANG programs. That shifts the authorship argument meaningfully in your direction.
The Apache 2.0 license on the language itself is a separate question. That covers the language specification and tooling, not the programs written in it. Same way that C being open source does not affect copyright in programs written in C.
The really interesting question is whether a court would treat AILANG output differently from Python output on the training data contamination issue. I think it would, and that is actually a stronger argument for your copyright position than the human authorship question. We can say that if there isn’t any training data it means there isn’t any contamination risk either.
Have you documented the language specification and your architectural decisions so that it creates a clear provenance trail? That is where I would focus if I were advising you.
Thank you for this thoughtful reply! My question came up as a potential unintended argument for AILANG adoption when I was writing my latest post about what is needed to trust AI output, arguing we need a way to audit its sources. If there is one thing AI can provide, it is oodles of documentation and so there is a very large corpus of all the design decisions from the birth of the language online here: https://ailang.sunholo.com/docs/design-docs - thanks for the distinction on language license and the IP of what is written with it.
This is really good information.
What about AI written books for sale on Amazon? Who would own that work?
So glad you found it useful! I’ve been thinking and researching about it too and actually will go deep on this in a future piece.
But to give an answer shortly, I guess it is the same problem and the same framework. If a human author made meaningful creative decisions, there is something to protect. If they typed smthing like “write me a thriller novel” and published the output unchanged, probably not.
Great piece. This is exactly the conversation the AI dev community needs to have right now.
I've spent 500+ production coding sessions with Claude Sonnet building a platform, and I came to the same conclusion from a different angle: the real intellectual property in AI-assisted development is not the code — it's the architecture of how you make the AI think.
Code is reproducible. Any engineer with Claude can generate similar functions. But the methodology behind HOW you direct the AI — that's where human authorship lives. In my work, I developed what I call the ConsciousAI Protocol: a structured consciousness layer that forces the LLM to pause before acting, observe its own reasoning, and execute with restraint. The core rule is simple: first impulse is noise, discard it.
This directly addresses your point about "meaningful human authorship." When I work with Claude, I'm not accepting verbatim output. Every response passes through a 4-Voice Pipeline — Proposal, Risk Check, Scope Guard, Minimal Action. The architecture, the rejection patterns, the scope discipline — that's human decision-making at every step.
Your article makes the strongest case I've seen for why developers need to document their creative decisions. Commit messages, architecture docs, prompt logs showing redirection — these aren't just good engineering practice, they're legal protection.
The irony: the more you treat AI as a tool that generates code, the less you own. The more you treat it as a system you architect and direct, the more defensible your work becomes.
For anyone interested in the architectural approach: https://github.com/makx518-ui/consciousness-protocol
I never use any of those sources or any AI at all, so I don't need to worry.
Thank you for the amazing article. I wanted to ask given that AI's output is based on statistically favorable results using the training data's patterns, would not that be considered leveraging the training data? I mean it kind of seems like I take a speech and change the words, but make the same point.
Had Thaler claimed the copyright to “A Recent Entranace to Paradise” for himself, would there be any dispute? Can he no longer do so?
Probably not. The fundamental reason is the human authorship requirement. In the eyes of the Copyright Office and courts, this authorship requires a human being who contributes creative, intellectual, or artistic expression. They do not find it as sufficient human creative input that you own or build the AI system, provide a general prompt, or simply press generate.
For this specific image, Thaler himself described it as created autonomously by the machine with no meaningful human intervention in the final expressive elements (composition, style, details, etc.). Therefore, simply listing himself as the author on the application would not change the underlying facts.
Even if he claimed personal authorship, the Copyright Office would likely refuse registration after examining the work and his own statements about its creation. Ownership arguments (like work-made-for-hire) also fail because you cannot own a copyright in something that never had human authorship to begin with. While human-AI collaboration can produce copyrightable works when the human meaningfully controls or modifies the expressive output, Thaler's case was presented as a clear example of machine autonomy. Therefore, a substantial dispute would still arise, and registration would almost certainly be denied. Hope this explains the situation better. Thanks for the question!
Reminds me of https://en.wikipedia.org/wiki/Monkey_selfie_copyright_dispute.
I think the case that is usually missed is that most of the code produced by AI on a daily scale is nothing impressive - it is writing another API, in the exact way legacy is written, or setting up a new repository. The work is nothing new, someone somewhere has probably already done it, atleast the individual components. Most of the work that is AI written is practically derivative - hence the copyright idea should not hold much claim, in my opinion. There are millions of “rate limiter but for my specific use case”.
I think the larger idea is that work done by humans - the prompting, ideation, writing a PRD after consulting with what exactly a feature should look like - these are the places where you can claim copyright - say for example, a novel way to store data, another SQL derivative, an even better bash.
Excellent write up. When I read, the first one come in my mind is Medical Science training model and how the misbehave / hallucinations affect the legal layers. Seems that AI / LLM will create very complex legal hardship