Well, my opinion of Consumer Reports is not good.
We could just leave it at that.
But hey, you asked.
I think that just like a lot of folks we meet in real life, CR suffers from a lack of understanding of what they want to test, what would be a good way to test it and how they run the tests affects a bunch of stuff.
So, let's start from that.
No matter how you run a test, someone will complain, period, end of sentence.
Naturally, they pick their favorite way and try to spin it in a way to elicit sympathy from the readers.
For example, for a long time, they tested washers with an 8-pound load (if I remember right), "because that's the average load a typical family washes". So, imagine you have a big-ass washer and dryer designed to wash a 20-pound load. Washing an 8-pound load. Because it's the "average" for a "typical" family.
There's all kinds of crap that can come from that test: the load could be too small for the machine, and either not get washed as well as if it were a 20 pound load, or it could be that such a small load will be wearing out more than a bigger load. Conversely, everything might go just swimmingly (because the manufacturer is *counting* on the fact the test is run the way CR does) and it gets a fabulous rating, but when you try to use it for the designed max capacity (either the only 3 times you do a year, or, conversely, because you *needed* a machine with such capacity), all kinds of things happen, from not cleaning very well, to excessive wear and tear (either on the clothes or the machine(s)) to failure to properly balance for spin (either at max capacity or at smallish loads).
In the past 30 years or so, people who read the magazine often may have noticed (I did) that their tests are just too rigid and manufacturers take advantage of them. For example, for a while, they tested only the "regular" cycle, so machines that were otherwise identical mechanically (for example, WP and Kenmore) would "clean" differently because they had different regular/normal cycles with different durations or agitation speeds. Then, when people complained, for a while at least (I dunno if this changed, I have not read their publication regularly in a while), they took to selecting the heaviest/longest cycle.
This is no way to run tests. If they were *serious*, they'd buy _every_ model that was different, to run every test. Every battery of tests should include one item, two items, three items, a "small" load, a "medium" load, a "large" load and a *full* load to the design specs, in *addition* to an "average" load of a "typical" family. Which, by the way, has gone up from 8 pounds (25 years ago or so) to at least 14 pounds. They should also include in those tests every major cycle the machine has. Even though I *can't* line dry and use a dryer for everything, I *still* want to know if the "permanent press" cycle really does help reduce wrinkles, or if the silk cycle and the wool cycle are worth bothering with, or if a different brand/model with just a "delicate" cycle would be sufficient. What is the difference between each cycle? Are they effective? How does buying this model/brand compares to all the other ones?
We read their "reports" and we *still* need to go to the internet and ask real users.
Same thing with both the detergents and washers test: I want to know what would happen in my home if I bought one of their recommended machines. Sadly, I *need* to run my machines with real clothes and real dirt, which is precisely what they *don't* do, or at least what they claim they don't do. They say the put in an 8-pound load of *clean* clothes to which they attach (if I remember right) about 10 "swatches" that have a "scientific" set of stains applied to them. Sadly, since I have never seen those for sale, I can not do the only thing that is interesting when one runs a test, which is to *reproduce* the tests. Are those stains much harder to remove than the ones in my home? Are they much easier? Is the "load" of stains much more dirt than the totality of "dirt" in my loads? How can we compare those?
Here's why I ask. Consider the dishwasher tests, for example. They don't teach us how to run that either. They just patronizingly assure us that "their dishes are *much* dirtier than your dishes". And that they run the tests in "scientifically mixed" hard water to make the test "as tough as possible". That creates problems. Because machines (and dishwasher detergents) that work well in hard water may foam excessively in softened or naturally soft water. The glass may become etched. It is also possible that their dishes are *not* as dirty as our dishes.
Here too, the way they test the machines introduced problems: when they compared very competent machines made in Europe with the American machines about 20 years ago, they claimed that Miele and Bosch, for example, did not clean well. That was a surprise for the owners of such machines, because their user guides were *very* clear about what to do: are you loading the dishes _right_ after the meal? You will save money (by using fewer resources) if you run the normal/regular cycle. Have you been loading the machine over a couple of days until it's full? You will need to run the "heavy soil" cycle to take care of that. Do you have pots and pans with stuck on food? You will need the pots&pans cycle. So, imagine comparing a machine with a regular cycle that runs a very short pre-wash and a wash, and then added more water changes and higher temperatures for heavy soil and pots-n-pans, and also varies the pressure of the pump according to the difficulty cleaning, with a machine that was *designed* to be tested by consumer reports, so they run a regular cycle with more water changes. It wasn't until the "energy efficiency" plus "I want a silent machine" craze hit and people wanted machines that cost less to run that American machines started stumbling to clean with less water, not to mention Euro brands started labeling their more intense cycles "normal" and the stuff that used to be normal "eco" that things equalized a bit.
All this crap could have been avoided *entirely* if they ran a full battery of tests, from the lightest cycle to the heaviest, with a nearly empty machine, to half-full and completely full. Compare all the machines. Let's see what actually happens.
There's also another factor that is *clear* if you know a line of products -- like you mentioned, why is it that a TOL machine seems to clean less well than a supposedly entry level machine? We've seen that quite a lot. We've seen people on the internet in general, and here in particular, claiming that machines that come from a factory are all the same except for bells and whistles. Implying "don't be foolish, you get the same quality, don't pay more". That can be true of a particular brand, but it's not true of all brands. At least here in US, Bosch and Miele dishwashers can be vastly different from the entry level to the top-of-the-line. Something as simple as a different rack system or an entry-level (which is usually what CR picks out to test) with a more intense "normal" cycle, which is the one they tend to test, might skew the tests, for example. They can easily program the sensor cycles to be more aggressive in some models than others. When I had a Bosch dishwasher from 1999, for example, I saw the reverse happening, friends complained their machines where not cleaning as well, and when we compared, sure enough, my TOL (at the time) machine did clean better than their entry-level machines. The cycles were slightly different, and the rack system was less flexible, and because of that, stuff that I could easily flip a few tines to expose the dishes better to the water action wouldn't get nearly the same scrubbing in their machines as it did in mine.
CR also seems incapable of reading the user guides and using the machines to the best possible advantage. They can hide behind "well, most users don't read the user guides either, so we're simulating what they do", but the truth is, they are not "simulating" what people do. They *are* the kind of people that will complain bitterly about something, then 5 to 10 years later a new batch of workers joins their work force, and all of a sudden, you go from this vacuum cleaner, dishwasher or washer sucks to it going to top of the rating, even if there were no design changes.
Like I said before, manufacturers quickly find out what CR is testing and how and "cheat" too. For example, in the early/mid-90's, they rated Ultra Tide top performance. Because supposedly, it produced "whiter" clothing than Wisk.
So, I bought a box to try. Yes, the laundry did look "whiter" when washed in Ultra Tide than when washed in Ultra Wisk. But, what a surprise it was, that when you looked at the light *thru* the cloth, stuff washed in Wisk was clean, while stuff washed in Tide had stains from chocolate milk still faintly there, because the enzymes in Tide were not nearly as effective, they were amping up the "optical brighteners" to fluoresce so much that it hid the stains from the "scientific" color spectrophotometer or whatever it was they used at the time: the equipment shone a light on the cloth and measured the colors *reflected* from them to see how clean it was, instead of shining a light *thru* the clothing and checking it out. And, even if you make the light go thru the clothing, optical brighteners can still hide a lot.
Anyway, this is all very subtle and many people can fly thru life ignoring or not knowing any of this.
It's only when CR makes some glaring mistakes that people start paying attention. It's often the folks who have good audio or video equipment, or the folks who have several expensive cars. Back then, 20+ years ago, this could be dismissed with a "people are jealous" or "they bought expensive equipment, so they are trying to justify the ton of money they've spent". But, with the internet, a much bigger percentage of people started comparing notes from real-life results, and when we found out that CR is often wrong on one or two things we actually know a lot about (sometimes more than just one or two things) that we start wondering if they got these things wrong, if they are not wrong about other stuff too. Which is when we ask that and the internet does not disappoint: folks from all walks of life with knowledge from many different areas will tell you in detail what is wrong and it starts being hard to ignore that CR is either super clueless, or plays super clueless *really* well to hide their biases.
The only thing I give them any credit nowadays is when they talk about something being hard to use or access (for example, "tiny buttons on the radio make it hard to use it while driving"), and even then I don't take it as gospel, I just put it on the list of stuff to pay attention to when I'm trying out the product or looking at it in stores.
In any case, I have not found any testing company to be ideal -- one needs to be very aware of the way they run particular tests, and what kinds of strengths and weakness (not to mention blind spots) each organization and their tests have. Then normalize for that. Try to see the products in actual use, particularly if you have friends who have the product for a bit. It's very normal to be excited by something for a little while, then after you get used to the product, you see a bunch of things you didn't notice before, sometimes you learn something new that make using the product better than before, sometimes you find out a bunch of limits and limitations that seemed unimportant at first.
Have fun!
-- Paulo.