Jump to content

Wikipedia talk:Proposal to expand WP:CSD/Proposal V (Copyright violations)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
The talk page Wikipedia talk:Proposal to expand WP:CSD was split into individual talk pages for each proposal, to limit the size of the talk page and facilitate individual discussions on each proposal. The history and attributions for the comments made before the split can be seen by following the history link on the /General talk page.

Blatant copyvios

[edit]

I would also like to see some rule allowing the deletion of blatant copyright violations. The percentage of articles that are rescued after being posted on Wikipedia:Copyright problems is minuscule and the danger of losing good content by making them easier to delete is small. - SimonP 17:46, Dec 5, 2004 (UTC)

  • No harm in putting it up to a vote. BLANKFAZE | (что??) 18:14, 5 Dec 2004 (UTC)
    • You'll have to be more specific as to what constitutes "blatant" copyright violation, since generally it's near impossible to ascertain whether a user does or does not own the copyrights or have authorisation to license content under the GFDL, even if you do find the exact same content on some webpage. Of course, in 99.99% of all cases it will be a copyvio, but I'd like to have the proposed CSD policy state explicitly that as long as the content can be found elsewhere and the user has not claimed to be the copyright owner or authorised to license content the policy applies. --fvw* 02:33, 2004 Dec 6 (UTC)
      • Agreed, but it should also be noted that the danger of false positives in these cases is very low, since if the content turns out to have been legitimate it can just be recopied. - SimonP 03:08, Dec 6, 2004 (UTC)
        • Yes, I'm all for it, I just want it explicit as vague CSD criteria are causing enough trouble already. --fvw* 13:59, 2004 Dec 6 (UTC)
  • Why do we need this, when {{copyvio}} tag is perfectly fine?? Enochlau 03:13, 1 Jan 2005 (UTC)

Proposal V

[edit]
Any article that consists only of content in blatant, easily verifiable violation of copyright that has not since been subsequently edited or improved by another user and was submitted by a user or IP with no legitimate contributions.

Completely agree with the intent, completely disagree with the wording. Reasons: what's "blatant"? What's "easily verifiable"? And of course, what's a "legitimate contribution"? Don't single out "legitimate contributions", ever, unless you want to explicitly accuse specific individuals of being sockpuppets.

Surely we just want Proposal V to stop enthusiastic newbies and spam vandals from cutting and pasting swaths of text from websites—for anything more, we'll just have to talk it out on Wikipedia:Copyright problems. To that end, I suggest:

Any article by precisely one editor, with content identical to material that exists elsewhere, when this material is not immediately verifiable as compatible with the GFDL. The editor must subsequently be informed on their talk page that such deletion has happened, with an external reference to the existing material, and instructions on how to prevent this from happening again.

A few important points of clarification.

  1. The requirement of "precisely one editor" might seem very strong, but is the only workable option. If there are multiple editors, they could all suddenly find their edits gone without indication of what happened, and it's not reasonable to inform every single editor personally.
  2. Similarly, "identical to" looks very strong, but is necessary: if the article's content is not identical, but contains edits, it's possible that it is the only existing copy, and you are needlessly making someone's derivative work unavailable because they forgot to say they held the copyright to the original material.

    An alternative is to merely demand that the material is clearly derived from material that exists elsewhere, but only if administrators are prepared to promptly undelete any speedy based on material that turns out to be GFDL-compatible, and only if we're prepared to deal with the shock and hurt of people who suddenly lose their preciously edited article because they were careless. I say it's not worth it: just put it on Wikipedia:Copyright problems if you can only claim clear derivation. Of course, as always, we rely on the good judgement of administrators: if the only difference is that, say, one sentence has been removed in the middle, you should still call it "identical" (or rather "trivial to restore it if you'd want to").

  3. Suspicious material might be under a license that's compatible with the GFDL, or it might even be copyrighted material that the copyright holder genuinely wishes for us to have. Doesn't matter, it should still go. Licenses and the way to get permission can be arbitrarily complicated, and we can afford to err on the safe side: for this to apply the content must be identical to suspect material, and then it's trivial to reinstate it.
  4. We can discuss limiting the posted notification to registered users only, so as to not burden admins unnecessarily with messages that are likely to never be noticed or go to the wrong person. I do believe we should always inform registered users, just in case it's a clueless newbie problem or a cooperating copyright holder of little words. This can simply be a boilerplate message asking the user to please state why the material is not in violation despite having all appearance of being so, and direct them to the appropriate problem pages; this does not need to be signed and can be automated for admins to the point where they only have to fill in a URL pointing to the material that was duplicated.

Just a few absurdly detailed suggestions to make it more likely to pass. ;-) JRM 18:51, 2004 Dec 10 (UTC)

I agree with all your suggestions and your rephrasing. It is important to stress that Proposition V should only apply to text dumps. Something copied but wikified by its submitter should not be speedy deleted. - SimonP 19:09, Dec 10, 2004 (UTC)
I'd prefer a mix of sorts of the two: Any article that consists only of content in blatant, easily verifiable violation of copyright that has not since been subsequently edited or improved by another user and was submitted by a user or IP with no legitimate contributions. The editor must subsequently be informed on their talk page that such deletion has happened, with an external reference to the existing material and a friendly reminder that copyrighted material cannot be contributed to Wikipedia. BLANKFAZE | (что??) 21:57, 10 Dec 2004 (UTC)
So basically, nothing I said, except the notification, which I consider an add-on to the proposal (and has indeed been added-on by you). If you really think your formulation is clearer and better to include in policy, well, that's your opinion. As long as we're all talking about the same thing, which I don't doubt we do. I'm just a little paranoid when it comes to the wording of policy additions—people often act as if they were laws, and a lot of pointless discussion is waged over what a particular word could mean. I just thought my version would avoid such problems. Keep in mind that you're talking about speedy deletion here. I think a little extra care is warranted. JRM 19:14, 2004 Dec 11 (UTC)
No, that's not the way it is, "nothing I said, except"... I think the notification clause is a great idea on your part. I just think the existing version is worded more appropriately. For instance, your proposed version contains neither the word "copyright" nor "violation", which is in my opinion necessary. I'm all for compromise though. Perhaps —
Any article that consists only of content in blatant, easily verifiable violation of copyright or which is not immediately verifiable as compatible with the GFDL, and said article was submitted by a user or IP with no legitimate contributions and has not since been subsequently edited or improved by another user.
The contributor must subsequently be informed on their talk page that such deletion has happened, with an external reference to the existing material and a friendly reminder that copyrighted material cannot be contributed to Wikipedia. BLANKFAZE | (что??) 20:30, 11 Dec 2004 (UTC)
Sorry for being snippy. I was snippy. I shouldn't be snippy. Snippy is bad.
Explicit mention of the GFDL may have been overgeneralization on my part. The main problem of the original text is that "blatant, easily verifiable violation of copyright" isn't. You don't know if someone doesn't actually own the copyright and wants to give the material to us. It's possible—we do not require that you advertise this on the page, or something. All you can say is that it's probably violating (i.e. an anon posting the front page of a major site is quite unlikely to be the copyright holder).
That said, I can see why the magic words "copyright" and "violation" would have to be included for people not to go cross-eyed. My GFDL mention might just be semantics nobody's waiting for, and I do not insist on its inclusion.
What I still cannot live with, however, is mentioning "a user or IP with no legitimate contributions". This is silly. If user X vandalizes George W. Bush and then copies a webpage it's speedily deletable, but if user X first adds a coherent article on their favorite comic book character and then copies a webpage it's not? I understand what you're getting at, of course, but the wording could be more judicious. And it doesn't matter whether it's registered accounts or anonymous IPs doing the copying, "contributor" covers all of these. New proposal, minus my GFDL minutiae but without arbitrary distinctions:

Any article that consists only of blatant, easily verifiable violation of copyright, which has undergone no significant editing by its creator and no any editing by anyone else.

This is a reformulation of my "identical" demand above, which should be slightly easier and also uncontroversial: it doesn't really matter whether people have "improved" it (what's improvement?) only if they've edited it. If anyone else has: hands off and report as copyright violation.
Note that copyrighted material can be contributed to Wikipedia. I wager you do it all the time, my friend. You maintain copyright over all your edits, unless you specify otherwise. The GFDL just allows others to use the fruits of your labour, under certain conditions. The notification should thus not be a reminder that something we all do is not allowed. I propose to keep my original here:

The editor must subsequently be informed on their talk page that such deletion has happened, with an external reference to the existing material, and instructions on how to prevent this from happening again.

The "instructions" would be simple enough, namely, to inform someone that you are the copyright holder of the material. I'm not sure if that should be a separate page or just Wikipedia:Copyright problems, but it should not just be "you can't post copyrighted material", because that's false.
Wew. Anyone else got any thoughts? :-) JRM 22:15, 2004 Dec 11 (UTC)

Thanks for your thoughts. Firstly, the "a user or IP with no legitimate contributions" is intended, I believe, to give the benefit of the doubt to users who have been good contributors. Secondly, are you sure you don't want to include a GFDL qualifier? I think it might be a good thing. My only problem with your notification clause is the word "this" from "how to stop this from happening again". It's unclear what the pronoun "this" is referring to. BLANKFAZE | (что??) 20:09, 12 Dec 2004 (UTC)

Re the GFDL qualifier: the problem is that it starts to read very much like legalese already. The GFDL phrase makes the copyright violation mention redundant, but unfortunately, people need to actually understand what it's talking about, and that's probably not going to happen, so... *sigh*
And I understand what the "legitimate contributions" thing is intended to convey, but in combination with "blatant violation" it's expressing the wrong thing: that it's alright to have a "blatant, easily verifiable violation of copyright" if the user has "legitimate contributions"! Of course that's not what you mean. Here's attempt number... Oh, what was it? I lost track. :-)
Any article that consists only of content in blatant, easily verifiable violation of copyright or which is not immediately verifiable as compatible with the GFDL, unless said article was submitted by a user or IP with legitimate contributions or has since been subsequently edited by another user.
I hope you'll agree that the "or improved" clause really is redundant; you can't improve the article without editing.
Re notification: well, d'oh! If you'd said that in the first place... That's a fairly easy thing to fix, methinks.
The creator must subsequently be informed on their talk page that such deletion has happened, with an external reference to the existing material, and instructions on how to prevent any recreation of the article from being deleted again.
Persistent recreation without heeding the warning is, of course, blockable/bannable vandalism. That clear everything up?
(We should probably archive this away, or something. It's getting very long and I'm sure not everyone who comes here is immediately interested in my nit-picking.) JRM 21:38, 2004 Dec 12 (UTC)
The best option might be to do what is already done with criteria I (patent nonsense) and criteria III (vandalism). Have short and simple statement on the CSD page e.g. "blatant copyvios can be speedy deleted" and leave defining blatant copyvio to a separate page that goes into the necessary depth and legalese. - SimonP 21:52, Dec 12, 2004 (UTC)

By the way, I like JRM's last revision of the proposal, I think I'm going to use that one. BLANKFAZE | (что??) 22:48, 12 Dec 2004 (UTC)

"Easily verifiable"

[edit]

This proposal seems to be be running into resistance centered around the phrase "easily verifiable violation of copyright". I don't see how, in practice, this is any different from a less-controversial "verified violation of copyright", since the second part requires a talk page message including the external reference—which can only be provided if one's already been found. —Korath (Talk) 07:47, Jan 2, 2005 (UTC)

I was really hoping this one would pass. Perhaps it should be re-proposed at a later date, but with the understanding that a specific list of common copyvio sources (i.e. allmusic.com, mtv.com, brittanica, etc) are the only ones that count. This would mean that if I see someone post a band bio from allmusic, I could speedy it because I know it is copyrighted and allmusic is not going to release their content under the GNU FDL (I suppose it would be good to ask, to get an official denial). I couldn't speedy it, on the other hand, if it was an apparent copyvio from some random website or blog or something. This might be more likely to pass and, while more limited, could significantly simplify the copyvio process by not clogging it up with obvious copyvios. Tuf-Kat 07:22, Jan 8, 2005 (UTC)

Another idea for copyvio deletion

[edit]

I voted against this proposal because I think our current copyvio procedure works fine. But I have another idea for a CSD: pages that are inherently obvious as copyvios and don't need Google testing or any other outside evidence to show that they're copyvios. Examples would include music lyrics and copies of well-known printed works or scripts. In nearly all of these cases, the material is blatantly unencyclopedic in addition to being a copyright problem. More importantly, music lyrics and the like could get Wikipedia in far more trouble than routine, garden-variety copyvios such as text dumps from websites. Szyslak 02:56, 10 Jan 2005 (UTC)

  • A clarification: "inherently obvious copyvios" DOES NOT mean something "appears somewhere else." Our regular CP procedure works fine for those cases. Szyslak 03:37, 10 Jan 2005 (UTC)
This could be combined with my idea just above it. Note, however, that lyrics to old public-domain songs are fair game, to whatever extent quotation is necessary (full texts go to WikiSource, however, short ones may well be usefully quoted in their entirety here). There's no reason to wait around on a copyvio of text from a source with clear copyright that is extremely unlikely to allow for GNU FDL distribution. Tuf-Kat 22:17, Jan 10, 2005 (UTC)
I like your idea. Maybe someone with a really huge amount of time on their hands could put together a bot that can find text dumps from allmusic, Britannica, Encarta, etc. by scanning for text dumps with certain capitalization and syntax patterns that have been edited by only one or two users.

Also, I was thinking we might want to come up with another way to handle these cases besides speedy deletion—"speedy copyvio," perhaps. That would lighten the load on regular copyvio and keep borderline cases out of speedy deletion. Szyslak 03:03, 11 Jan 2005 (UTC)