55

Although this has been reported before (1, 2), and the current canonical guidance for how to approach sites or scrapers copying SE content is also available, I wanted to share information that I have learned about the MO of these channels and going through the removal process with YouTube in a single canonical location.

A few days ago, a single YouTube video was brought to my attention that copied not only my content, but the content of the question and multiple answers from an SE network site. I dug around and found an entire channel dedicated to reposting SE content in a manner that I consider inconsistent with the CC BY-SA license that the content is made available. Further searching revealed multiple channels doing this.

The channels are:

  • https://www.youtube.com/@RoelVandePaar (this is now the largest channel by number of uploads on YouTube)
  • https://www.youtube.com/@peterschneiderQandA
  • https://www.youtube.com/@LukeChaffeyTechInfo
  • https://www.youtube.com/@gamershelpadvice
  • https://www.youtube.com/@pythonoracle
  • https://www.youtube.com/@computeroracle (they do appear to try make their video CC BY-SA, but do not follow the attribution requirements for the content used or CC's requirements for licensing under one of their licenses - I do not believe they understand or comply with the license terms)
  • https://www.youtube.com/@Nida.Karagoz (they do appear to try to make their videos CC BY-SA, but use YouTube's licensing feature to mark as CC BY and neglect to follow CC's marking guidance in the video and/or video description)
    • This channel is also associated with a website which reproduces SE content in text form: https://the-computer-oracle.avk47.com/
  • https://www.youtube.com/@SophiaWagnerQandA
  • https://www.youtube.com/@SophiaWagnerQandAs
  • https://www.youtube.com/@theenglishoracle
  • https://www.youtube.com/@MaxProTips references the Stack Exchange licensing page in several videos, however I haven't found one that references a specific question or appears to take content verbatim as some of the other channels do

I've found the following characteristics:

  • The title of the video is usually the question on the SE network site.
  • Most of the channels link to https://meta.stackexchange.com/help/licensing in their Description. However, some channels have poor formatting that breaks this link.
  • Most of the channels identify the users whose contributions that they have stolen. However, most of them also have poor formatting that prevent these links from turning into hyperlinks. They also do not identify which pieces of the video were taken from which user.
  • Most of the channels link to the question that was stolen.
  • Some of the channels convert the body of posts to images, but it's not clear how. In some cases, external URLs are formatted like URLs (but, since they are in a video, are not usable). In other cases, the anchor text for a URL is replaced by the URL itself. Depending on how the text is rendered, this could make the content difficult to read and understand.
  • Some channels use machine-generated text-to-speech to read the questions. Others overlay music and scroll through the text. I have not found a channel where a human reads the content or adds any kind of commentary or discussion. Some channels do have a human intro, but I have not found any intro that is even tangentially related to the content of the question or its answers.
  • These channels post frequently. They have tens of thousands of even millions of videos. Some videos even have hundreds of thousands of views. It does appear that they are attempting to monetize their channels (which is allowed by the CC BY-SA license).

I do not believe that, as a rule, these channels are following the Attribution-ShareAlike requirements:

  • The Attribution is insufficient. The best practice guidance is TASL (including the Title, Author, Source, and License of the work). The title of the video does correspond to the title of the question, and a title is not required if not provided. Although authors are listed and links to their SE profiles are provided, the video nor the description associates blocks of content with a specific author. The license specific CC license is not mentioned, which is important since SE content could be under one of three licenses depending on what it is posted - someone must visit two pages (the Meta SE Help Center page and the question itself) to learn the license and then a third page to read the license. There are also no direct links to the answers themselves. I do not believe that this satisfies Section 3(a)(2) of the CC BY-SA 4.0 license, Section 4(c) of CC BY-SA 3.0, or Section 4(c) of CC BY-SA 2.5.
  • The ShareAlike clause is violated. Although there are fuzzy boundaries around what constitutes adaptations, derivative works, and reproduction, if converting the content from a web page to a video does constitute an adaptation or a derivative work, then the video needs to be shared under a CC BY-SA 4.0 or compatible license.
  • Most channels attempt to capture the licensing as "Content (except music & images) licensed under CC BY-SA". However, it does not provide a link to the appropriate CC BY-SA license terms, reference a version (or versions, in some cases) of the CC BY-SA license. This also assumes that the video is not a derivative work that must follow the ShareAlike terms, which would require that the full video and audio be made available CC BY-SA or compatible.

I have had success in getting YouTube to remove these videos that I do not believe fully comply with the CC BY-SA license. Doing so requires emailing or submitting a web form that includes a link to the video, a link to the source content, the timestamps in the video where the content appears, and some statements asserting that you own the content and are authorized to make the request - the details are all in the YouTube Help Center. If you find your content on one such channel, you may be able to request the removal of the videos that do not comply with the license terms.

I'm also finding that the YouTube removal process (and US copyright law) doesn't favor individual contributors. The channel creator can perform a counter-notification, including making factually incorrect statements, that puts the content owner (the author of the SE post) in a position where they need to commit to legal action or, after 10 business days, the content is restored and the copyright strikes are removed. There is no way for a copyright owner to talk to someone at YouTube to provide information and you're forced to not protect your work or to find a lawyer.

asked Jan 21, 2024 at 22:15
17
  • 8
    I suspect this would break both sites AUPs - and frankly there's no point in AUPs if you're not going to enforce them Commented Jan 22, 2024 at 1:29
  • 9
    AUP = acceptable use policy Commented Jan 22, 2024 at 3:06
  • 4
    That style of video is usually spammers. They steal content from anywhere they can: tv/movie clips, Reddit stories, even ripping off some other video creators/channels entirely - with the end game usually being to get users to click (spam) links in the video description and/or comments. Unfortunately, individually nailing them with CC infractions probably won't meaningfully make a dent. The best case is they improve their legit links/attribution in order to avoid having videos taken down. Whether that would be considered a win or not.... up to you. Commented Jan 22, 2024 at 3:43
  • 2
    Honestly, combatting these channels on improper attribution and sharealike technicalities is not really solving anything. If they gather thousands of views and thus make a substantial amount of money, they will get it right eventually. While we may not like it, making zero-effort copies in video form and monetizing them is allowed, given that licensing requirements are followed. And while these licensing requirements should be followed, the fact that they aren't now isn't that relevant, no-one wants to reshare or adapt these videos and people looking for the original author can find them. Commented Jan 22, 2024 at 10:21
  • 4
    @ErikA I don't follow. Are you saying that no one should be protecting the work and ensuring it's made available under the rules just because you don't think the original authors are being harmed and people can still find the original content? Not only is it illegal (they are violating the license terms), it's unethical. Commented Jan 22, 2024 at 10:24
  • 2
    @Thomas What I'm saying is that if someone does make a genuine effort to attribute and share-alike a piece of copied content, but doesn't exactly match the requirements, that may not be worth going against. I think the unethical part is mainly profiting off of others work without adding value, not "not exactly attributing and licensing properly". And that's explicitly allowed. Not following the license properly is more incompetence than unethical behavior. Commented Jan 22, 2024 at 10:31
  • 7
    @ErikA I still don't understand why you don't believe that people who don't follow the rules of the license shouldn't be gone after. Ensuring appropriate attribution is, by far, the most important requirement for me, and that's not being met here. But the material should also be available for anyone who may want to use it in the future as well. If this was an example of one or two cases, I'd probably think differently. But these are tens of thousands of videos stealing from hundreds or thousands of people and they deserve to get shut down. Commented Jan 22, 2024 at 10:41
  • 1
    It's a computer script that needs a bit of tweaking to attribute and share properly. You're not making sure it's gone, you're making sure the script is a bit better written. Which is valid, but thinking "people should be gone because they don't format the user URL properly and don't reference the proper CC BY SA version" is weird to me. And the problem is tiny compared to the scale at which GenAI companies are stealing our content, without making a genuine attempt to share or attribute. It's just a bit more blatant. Commented Jan 22, 2024 at 10:57
  • 6
    @ErikA If you don't care about people following the license(s) your content is under, I don't know why you're here. I don't care what people do if they do what's allowed. And don't bring GenAI into this because we don't need the tangent. No, GenAI companies are not "stealing our content" because obtaining content for training is protected under fair use. These people are stealing our content because what they are doing is not protected and they are violating the license. Follow the license and we're all good. If that means updating the script, then that's what it takes. Commented Jan 22, 2024 at 11:51
  • 1
    I do care about the licenses. I just don't think takedowns for not formatting attribution best practices but clearly attributing, and not noting the exact version of CC BY SA licenses but clearly licensing it as CC BY SA, while you have no authority as you are not the copyright holder, are appropriate. Commented Jan 22, 2024 at 12:49
  • 9
    @ErikA They aren't "best practices". The requirements for attribution are clearly specified in the license and the best practices give one possible suggestion. Since the requirements laid out in the license are not met (using the best practices or some other accepted method), the license is being violated. And, as the copyright holder to my work, I'm requiring that people who use my work actually read and follow the license terms fully. Commented Jan 22, 2024 at 12:51
  • 1
    @ThomasOwens No, I'm the final edit (edited the tags) the person clearly isn't going on the revisions page to actually attribute correctly. Commented Jan 22, 2024 at 21:30
  • 1
    There is also Stack Exchange's particular AUP. Commented Jan 23, 2024 at 19:27
  • 1
    @This_is_NOT_a_forum I'm looking into this. And SE is looking into this now. I'm pretty sure there is scraping involved, at the rate videos are created. But there needs to not only be evidence of scraping, but the ability to detect and shut down the scraper. And it's not clear about stuff scraped before that policy was in place. So...pending on that front. Commented Jan 23, 2024 at 19:47
  • 5
    @TymaGaidash I suggested Thomas to do that to avoid inadvertently pouring SEO juice on those channels. He thought it was a good idea. Commented Mar 6, 2024 at 2:34

2 Answers 2

14

Since it was brought up in the comments, if you want to go hunting for SE content on YouTube, a basic way to start is by googling.

For the MSE licensing link, you can find a good many by googling link:meta.stackexchange.com/help/licensing site:youtube.com

Otherwise, you can try other searches, such as

site:youtube.com AND ("stackoverflow.com" OR "stackexchange.com" OR "askubuntu.com" OR "superuser.com" OR "serverfault.com" OR "mathoverflow.net")

That assumes that the link's rendered text uses the URL itself. Otherwise you'd need to write something like this:

site:youtube.com AND (link:stackoverflow.com OR "stackoverflow.com" OR link:stackexchange.com OR "stackexchange.com" OR link:askubuntu.com OR "askubuntu.com" OR link:superuser.com OR "superuser.com" OR link:serverfault.com OR "serverfault.com" OR link:mathoverflow.net OR "mathoverflow.net")

You can add your username in quotes in the query if you want to looks specifically for your own content. For example,

site:youtube.com AND (link:stackoverflow.com OR "stackoverflow.com" OR link:stackexchange.com OR "stackexchange.com" ...) AND "starball"

Actually, using your (site-specific) user ID seems to catch quite a bit of stuff too (if the video descriptions link to your user profile). Ex. site:youtube.com AND (link:stackoverflow.com OR "stackoverflow.com") AND "11107541", or instead of "<user-id>", "users/<user-id>".

The site list I'm using isn't exhaustive over all existing SE site domains, but it mostly covers sites that don't use a stackexchange.com sub domain (see Why do some Stack Exchange sites have their own domain names?).

If you want to expand the search to sites other than YouTube, you'd instead replace site:youtube.com with -site:stackoverflow.com -site:stackexchange.com -site:askubuntu.com -site:superuser.com -site:serverfault.com -site:mathoverflow.net.

answered Jan 22, 2024 at 5:49
0
-11

This just feels weird. I had to make an account just to write this. On one hand I get crediting and all I don’t have a problem with that but being a stickler over the exact written format of attribution is silly in this day and age as long as attribution to the proper channels is made seems ethical. Also, why stop the flow of information for people to learn about technology. Some people don’t go to stack exchange or may not know what it is. Having a video version of the article or answer is just another means of giving information out and bringing new people to the site. I don’t know how much of the users get paid to answer questions on here but something tells me that isn’t the purpose of the site. Now, if we established making money off of the content is legal if unethical because of a change of format I also don’t see the problem. It may be opportunistic but you didn’t think of it and they did. We established they put even a little effort into making a legal distinction and crediting. Maybe they should have the money thrown their direction for sites like this. This seems like it benefits all parties unlike LLM scrapers that literally only benefit these mega corporations and give them more wiggle room in lobbying for more right for the corps while individuals to have less rights with IPs. The corps are lobbying explicitly for right for me but not for thee. So, if on the individual scale it universally beneficial why don’t we let them stick it to the man. Seems silly we’re raising a stink at the little guy spreading quality information.

5
  • 7
    "I don’t know how much of the users get paid to answer questions on here" all contributions are voluntary, there is no payment. Commented Apr 27 at 6:57
  • 8
    These folks literally do a stock intro, and often machine generated reading of the posts with insufficient attribution. Its... unethical no matter how you slice it Commented Apr 27 at 8:33
  • 7
    I have a different take than @VLAZ. Attribution is our payment. Especially on the professional sites, making sure that words and ideas are appropriately attributed to the author is how our professional reputation off the SE network is built. There's no legal issue with monetizing content. There are legal issues with improper attribution and ethical concerns about how it is being monetized, such as using stock intros, automated "slide shows," and machine-generated speech. Commented Apr 27 at 11:14
  • 7
    In essence, you tell us that you proudly know nothing relevant about our community, yet still feel that you have a firm basis for ignoring our content license, as well as possibly the YouTube terms of service. And also that by doing so you can supposedly help the site by getting people to visit it - people who understand even less about the site than you do. Commented Apr 27 at 15:04
  • 1
    Welcome to MSE! That aside, no one here is paid. Just because you are not getting paid does not mean you want to get paid. Would you happen to be one of the video makers with how defensive you are? Commented Apr 28 at 3:30

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.