Page 1 of 1

Amazon checks on "data-mining"

Posted: Thu Jul 16, 2020 11:04 pm
by WebExaminer
A few weeks ago, Andy gave me some useful advice


about extrapolating information from Internet Forums.

I'm thinking of publishing my book on Amazon and have been ploughing through all the guidance. There's a reference to the author being responsible for copyright clearances but nothing that I could find about its own views on copyright. This is not helped by Amazon's webpages being mainly American-biased. I did ring the help centre, specifying a call back from the UK but nevertheless got one from the States. I asked how much of my book could include copyright material without consent and was told "10 per cent", but on reflection I'm wondering whether the Help guy meant I could use 10% of a work in copyright (which somehow doesn't sound right).

Perhaps 6% of my text consists of direct quotes, about 1% being from people who have given me their consent. The others are all attributed to their authors and sources, though in several cases the quotes are presented in the same way as those for which I have consent.

People (not Amazon employees) on the Kindle Direct Publishing Community are sucking their teeth about my approach and one has talked darkly of "the KDP bots flag[ging] your book for either data-mining the internet or plagiarism, a KDP staffer might block your book for either failing quality assurance standards or creating a poor customer experience".

Plagiarism it certainly isn't, and two-thirds of my text is original, based on my own experiences, to which bits & pieces from various websites have been added, either with attributions in the text or indirectly in the bibliography. (This has been discussed to my satisfaction in my original thread.)

I'm wondering if any one has had any experience or knowledge of Amazon expressing unhappiness about what it perceives as misuse of original material and how easy is it to discuss the matter with them - not so straightforward if the employee is reflecting American, not British, views on copyright. The Help guy did say that I couldn't discuss the matter in advance and Amazon could only express its opinion once they assessed my ultimate submission.

Re: Amazon checks on "data-mining"

Posted: Fri Jul 17, 2020 7:02 am
by AndyJ
Hi WebExaminer,

I can't answer your question about Amazon's approach to this, but I can add to the warnings from others about taking a purely quantitative approach, eg 10%. As I explained in the earlier thread, if you rely on the quotation fair dealing exception, then relevance becomes a major factor in deciding whether something is too much.

The first problem, as you have identified, is that the mostly American companies which provide self-publishing services tend to apply US copyright law, which differs considerably on this issue compared to UK law. For instance there is no equivalent to the quotation exception in the USA. The second problem is the use of bots to determine whether there has been copyright infringement etc. These tools are notoriously poor at taking into account the perfectly legal exceptions to copyright, known as fair dealing in the UK and fair use in the USA. Both doctrines require a certain amount of subjective assessment in the particular context of the text being reviewed, and bots are bad at this. Obviously if there is some sort of appeals process involving a knowledgeable human after an initial failure to pass the bot test, then that may be the time when these finer judgements can by applied. It will all depend on the amount of risk Amazon are prepared to take in any given 'borderline' case. And unfortunately I have no experience of that.

If you haven't already done so, you might be better off asking this question on forums about self-publication as there will be more members with experience of the subject. I suspect that the straightforward legal context will be less important than Amazon's own policy.

Re: Amazon checks on "data-mining"

Posted: Fri Jul 17, 2020 1:12 pm
by WebExaminer
Thanks as ever, Andy. As I said, people on the Kindle Direct Publishing Community are sucking their teeth about my approach, which prompted me to post here. You generally confirm what one or two them are saying.

Re: Amazon checks on "data-mining"

Posted: Sun Jul 19, 2020 11:19 am
by WebExaminer
My first email to Amazon elicited an automated response that misunderstood my question and told me to do two things that I said in my email that I'd done already. I had a little more luck with my second attempt, the reply to which included: "As long as you hold the necessary copyright to a book you are looking to publish through the KDP platform, there should be no problem having your book published. As you already comply with UK copyright law, it is suggested ... that you contact an attorney or copyright law professional from the United States. They will be able to verify if the same copyright held in the UK would be applicable for publishing books in the U.S and other North American regions."

Which in itself is fair enough, though I can probably judge the situation for myself from links that Amazon thoughtfully provided. It ignored my enquiry about how much in the way of direct quotations taken from the Web (as assessed by its bots) would be acceptable.(And that is the real issue.)

My publishing aspirations are very modest and I shall probably seek another method of making available my work to the no more than a hundred people who might be interested.

(Should I have gone ahead with Amazon, I wonder whether someone in Britain who felt I'd infringed their copyright would resort to UK or USA legislation.)

Re: Amazon checks on "data-mining"

Posted: Sun Jul 19, 2020 1:49 pm
by AndyJ
Hi WE,

You didn't really expect a straight answer from Amazon did you?

On the subject of complying with US copyright law, the part you are concerned with is to be found in section 107 of the 1976 Copyright Act, headed Fair Use. As you can see there is no explicit exception for quotation as in UK law, but the fair use doctrine is relatively flexible. Here's a link to a fair use check list created by Columbia University which attempts to work out if fair use favours you or the various copyright owners. Sadly, when it comes to quantifying how much is too much, it only comes up with 'small' versus 'large', although it does go on to look at the relative importance within the original work (ie forums in your case) of the part quoted. And here's a link to a code of best practice in relation to fair use, put together by the American University which may also be of help.

If you believe that most of the forum members you wish to quote are based in the UK then the appropriate legal forum for dealing with any claims (if it were to go that far) would be Britain as Amazon (as the publisher) also has a UK presence.

Re: Amazon checks on "data-mining"

Posted: Sun Jul 19, 2020 4:27 pm
by WebExaminer
Thanks yet again, Andy

Having looked at the links, I think that I may be within US copyright law, apart from when I've used rather more than half of a short statement. Most of the people I've quoted wouldn't object and many will probably never know that I've quoted them! It's the thought of debating the matter with Amazon should their bots get suspicious ... Or should I say, trying to debate. And, let it be whispered, it's hardly going to be worth anyone's while to pursue the matter. But Amazon has a complaints procedure. I mentioned in my other thread that I'm not on good terms with the leading light of the forum I've consulted often in my research and which contains many of my posts (over which I have copyright, of course). I wouldn't put it past him to put in a complaint out of bitchiness. He hasn't responded to my email outlining what I'm proposing.

Re: Amazon checks on "data-mining"

Posted: Sun Jul 19, 2020 4:50 pm
by AndyJ
With regard to the fair use assessment, the US courts generally approach the issue by looking at the 4 categories individually and scoring them in favour of either the complainant or the defendant, and then look at the total: 3 or 4 out 4 for either party means a win; if the parties score 2 each, the courts often tend to give the economic factor greater weight than the others, so not overly helpful in your case as there's no real economic benefit either way. Theoretically you stand to make something from selling the books, but with a low print run, I doubt if you will be retiring on the profits. On the other hand, the forum postings provide virtually no economic benefit to their authors even if they wanted to publish their own utterings on the subject at some time in the future, so that point probably goes to you.

All of that is largely academic as I don't expect the US courts to be the place where you would have to defend yourself. It may be important when it comes to arguing with Amazon's legal department, but again I think that is also a pretty remote eventuality. It's more likely some lowly member of their team will just toss a coin.

Re: Amazon checks on "data-mining"

Posted: Wed Jul 22, 2020 8:57 am
by WebExaminer
I pressed Amazon further and got this reply:

"Please know that the book that you submit for publishing on KDP should meet all our content and guidelines and you should have the copyrights for publishing the material on KDP. We have a specific team which reviews the book's to see if they meet our guidelines or not and in case if there is any issue with the book that you submitted then they will contact you via email and ask for clarification for the content present in you manuscript. It is not possible for us to confirm if your book meets our content guidelines or not before the book is submitted for review. For more information on our Content Guidelines, please check:

I have taken your comments as feedback and passed it on to the concerned team, to look into this and if possible in future to have an option where we can provide publisher's with guidance on using other's materials. Thanks for understanding our limitations."

Which I guess is as good as it's going to get. I'd already seen the Content Guidelines. I'd groaned when I read "Some types of content, such as public domain content, may be free to use by anyone, or may be licensed for use by more than one party. We will not accept content that is freely available on the web unless you are the copyright owner of that content. For example, if you received your book content from a source that allows you and others to re-distribute it, and the content is freely available on the web, we will not accept it for sale."

This still leaves the question of "how much?". I fully agree that wholesale reproduction, or even a page or so, of an out-of-copyright book should not be allowed, but what about six lines from a 1667 poem? (Rhetorical question.)

One might infer that an Amazon historical book should only comprise copyrighted material for which one has sought consent to use.

Re: Amazon checks on "data-mining"

Posted: Wed Jul 22, 2020 9:37 pm
by AndyJ
WebExaminer wrote:
Wed Jul 22, 2020 8:57 am

Which I guess is as good as it's going to get.
I think you are right.

Re: Amazon checks on "data-mining"

Posted: Thu Jul 23, 2020 3:06 pm
by WebExaminer
In all fairness to Amazon, it might find it difficult to compose more detailed guidance; this would need to be concise but people would always find their own particular queries unanswered (a bit like Covid guidance), and then there's the cosmopolitan nature of its business, with countries having their own copyright laws. And it wouldn't want to give too much away about how sophisticated its bots are.

Colleagues on the Great War Forum report no problems getting their histories published, but then I don't know exactly when they produced their books. When trying to find out more about the situation, I came across an article of four years ago noting entire existing books being plagiarised and published through Amazon.