SEO Experiment: Duplicate Content on Amazon Reviews

I had old reviews on Amazon.com, and wanted to create my own blog with reviews, so started to move the old Amazon reviews over. I was curious to see if the pages would be indexed, or the Amazon pages would be favored as the canonical page.

On Sept. 26, I copied 9 reviews from Amazon to Product8Reviews. P8R is a relatively new site, but it was already indexed in Google.

The pages looked like this. The review text was copied, and a link to the product page was at the end.

I submitted the URLs through the Search Console, and they were being accepted in a half day or so. Search Console takes a few days to prepare reports, so I waited to see if the pages were deemed legitimate.

Over the next few days, I also copied more reviews to product8reviews.wordpress.com.

Results

On October 1, I checked to see if the Google index had the pages, and whether they were being hidden as duplicates.

The results were as follows:
Pages in index: 21, on Sept. 28.
A search for “product8reviews wordpress.com” pulled up 5 results, two from my site.
Clicking on the “repeat search with omitted results” returned 131 results, including many of the reviews.

So, Google was hiding results that looked like duplicates of reviews on the Amazon site.

On October 1, I started to delete reviews from Amazon, so the page wouldn’t be considered duplicate content.

October 3, 2020 – a search for a rice cooker page title “Sanyo ECJ-N55W 5-1/2-Cup Rice Cooker” which turned up my page with a SERP on page 7, so it’s at least 60+ SERP.

The results were as follows:
Pages in index: 21, on Sept. 28.
A search for “product8reviews wordpress.com” pulled up 2 results, both from my site.
Clicking on the “repeat search with omitted results” returned 127 results, including many of the reviews.

Oct 4, 2020 – data from Sep 29 to Oct 2 were added to Search Console. The results:

Pages in index: 32
A search for “product8reviews wordpress.com” pulled up 2 results, both from my site.
Clicking on the “repeat search with omitted results” returned 110 results, including many of the reviews.

So, I think what’s happening is that searches for “produce8reviews” will bring up some pages, but not all pages. It’s Google is hiding them as being “too similar”. It’ll keep adding them to the index, though.

Why are these pages seen as similar?

Could it be the Text to HTML ratio?

1, 2. Tool says it’s 5%. That sounds low. Google says it doesn’t use it as a signal, but Google does need to, somehow, distinguish duplicate pages from similar pages, which is a fuzzy process.

Action on Oct. 4: change the theme. The current theme is Karuna. The new theme to test is Independent Publisher, which results in a 6% text to code ratio.

Text ratio may be important. I noticed that this article got shown in the results, while shorter ones didn’t. However, this other article was not shown. Not only that this second article appears to be blocked from search results for “allintitle:nft rarible digital art market”. This is kind of stunning – the article is not spam, but seems to be identified as such.

Is the semantic markup in the theme important?

These are the HTML5 tags like <main> and <article>. I see a bunch of articles about this, but nothing definitive from Google. I dug around Amazon’s code, and no sign of HTML5 semantic markup.

So, as much as I like the idea of optimizing semantic markup, I’m not convinced it matters. Could it matter? Of course it could. Anything is possible.

2011-2012?

I was curious about why there were so many reviews for this rice cooker, and also why 2012 and 2011 were coming up in the URL so often, so I poked around, and found a bunch of review sites that looked like “spun” articles. They took a review, and altered it, so it wouldn’t be detected by Google.

Hundreds of these were linked together via pages all over the place, like Facebook. I think Facebook may have become a kind of link-laundering site. Some sites just linked to their own pages, hundreds of times.

It was like looking at a graveyard of old SEO machines.

These pages were often broken, but they came up with SERPs in the 20s and 30s; would shoppers dig that deep to buy something?

An Oddball Site Fol ve . net

I was digging through some reviews and noticed that the above site had a good rank, but also stole content from Amazon. Search for “Being japanese-american, i have been raised around rice cookers since birth.” I’ll bring up Az, but also this other site, that just copied a bunch of reviews into a mega review.

I wonder how they did that. How did they get around Google detecting duplicate content?

(Follow or subscribe to the blog to be informed when this experiment is complete. It’ll be linked from an update post.)