It seems that this should be avoidable by various technical means, without interfering with the LLM itself. For example, a set of fingerprints can be generated for the token strings (pieces of...
It seems that this should be avoidable by various technical means, without interfering with the LLM itself. For example, a set of fingerprints can be generated for the token strings (pieces of lyrics) used during training. Then LLM ouput can be compared with these fingerprints, and ouput that is too similar in the legal sense can be discarded.
Too similar in a legal sense is the multi million dollar question/problem. There’s always a ton of awkward grey area with similar sounding songs, and where creativity begins and copyright ends,...
Too similar in a legal sense is the multi million dollar question/problem.
There’s always a ton of awkward grey area with similar sounding songs, and where creativity begins and copyright ends, and that’s before you can legally prove beyond a shadow of a doubt that the possible infringed material was used in the training model
It seems that this should be avoidable by various technical means, without interfering with the LLM itself. For example, a set of fingerprints can be generated for the token strings (pieces of lyrics) used during training. Then LLM ouput can be compared with these fingerprints, and ouput that is too similar in the legal sense can be discarded.
Too similar in a legal sense is the multi million dollar question/problem.
There’s always a ton of awkward grey area with similar sounding songs, and where creativity begins and copyright ends, and that’s before you can legally prove beyond a shadow of a doubt that the possible infringed material was used in the training model
Ah yes, an unstoppable force (tech) meets and immovable object (music) lmao
Link without paywall: https://archive.ph/LcdWl