In reviews, the reliability of benchmarks of smartphones, CPUs, GPUs and other components has been questioned for decades, yet many people draw on that very data before buying a new product. In this article, we want to try to explore this in more depth, putting forward our points of view in the usual way: simple and without unnecessary technicalities. We will also deal with tests on other hardware components, and not just benchmark software. This is going to be a very long article, for tea with biscuits.
No benchmark is above suspicion
A benchmark is a software programme that carries out a series of tests to measure the performance of a product. This product can be a component (CPU, GPU, HDD...), the whole system (notebook, smartphone...) but also software. For more information, please visit the English page of Wikipedia. In common parlance, we often also call benchmarks what are tests of a different nature, such as tests on power supplies or tests on monitors. There is no such thing as a trustworthy benchmark in any field. Even economic and financial ones, which have far more impressive headlights shining against them, turn out to be unreliable…Interference
The most popular benchmarking software (and platforms) belong to private individuals and even those recognised as 'verifiable' are presided over or participated in by brands with direct interests. There are many ways in which benchmarks can be influenced, but none of them will return incorrect results. E a correct result is not necessarily true. Put simply: according to such-and-such's benchmark, a bottle of wine can fill six glasses. However, the type of glass and its capacity is decided by such-and-such and can differ greatly from our glasses. The sentence is therefore correct but not true.The bottom... of truth
Repetita SYSmark
For SYSmark it would seem to be an incurable vice, at least according to AMD. SYSmark is developed by a non-profit consortium consisting mainly of Intel's partners, as well as Intel itself. Among the members we also find a major test, Cnetwhich in the collective imagination should be more neutral than any benchmark. In short, a nice little environment without the slightest conflict of interest.
Difference between synthetic, real and hybrid benchmarks
Synthetics are the most popular and therefore the most well-known. Names like 3dMark and PCMark sound familiar to anyone who has searched for information online at least once. Yet synthetic benchmarks are also unanimously considered the most useless ones. The reason is simple: they measure performance while ignoring any other factors, without reflecting the actual user experience. This means that a manufacturer can easily manipulate their hardware to get the most out of a synthetic benchmark (and vice versa *blink-blink*).Real world benchmark
This standard of benchmarks focuses on real workloads, i.e. the creation of 2D and 3D models, the decompression of a file, the conversion of a video, etc. The problem is that they are not able to measure the performance of individual components. The performance measured is always that of the entire system. To get around this little granola, the larger magazines tend to use the same configuration, changing the individual component of interest for the review. So we will have the same ram, same mobo, same SSD, etc. but different processors, if they judge a processor of the same brand, and, of course, at least the mobo will be different with the change of processor. These tests should be repeated from time to time, all of them, because drivers change, software is updated, and there are a whole host of variables to consider. But no newspaper/magazine/website assembles a system dozens of times to re-run all the tests, they merely use data from past tests.Hybrid Benchmarks
Some synthetic benchmarks, including those already mentioned, are now considered hybrids. They perform the usual tests of synthetics, then add some sort of case history related to real-world usage. Here again, we have a type of benchmark that is useless for understanding the performance of individual components. Simplifying... The uninformed usually ask the more experienced: "What PC do you recommend?", followed by: "What should you do with it?". When the answer is: 'A bit of everything', the hybrid benchmark is useful. In all other cases, the RW benchmark is needed.Smartphone benchmarks
If it is possible to be more useless than a synthetic benchmark, smartphone benchmarks rank right up there. They represent both the apotheosis of the easily faked (even by ordinary users, via app + root access) and the emblem of the substantial difference between a high score and the user experience. Like and more than a PC, a smartphone has to be used for days before it can be judged, and there is no theoretical performance that matters.So why does everyone publish benchmark results?
Let's start by saying that there is a difference between using them and publishing the results. We at RecensioniVere we use benchmarks but do not publish the results. Use is mainly necessary to see on the fly if the system is stable, if any parts are about to fail, if the mounted unit is faulty, etc. Numbers are also taken into account, as is natural, but let us not fuel the race to see who is slightly better than the other, as the end user will not notice mai the difference.What about the others?
Compared to only five years ago, many publications have abandoned benchmarks or are limited to publishing only the score of the tested component, without comparisons. Behaviour that we support, needless to say, as this is how it has always been for us. Those who continue to publish them do so for three main reasons. The first is that the average user and fanboy demands the numbers, even when they don't understand them. Benchmarks, especially hybrid-synthetic ones, must be interpreted, it is not enough to just read them. Many times, the published charts show amazing results that do not coincide with the reviewer's own commentary. The second reason is that the charts are taken from other sites, on blogs and forums. For ranking on Google, it is important to have loads of links from other sites and publishing mega charts is one of the easiest ways to get mentioned. The third reason is sponsors and we will see this in more detail in the next section.How do I influence the outcome of a test?
Guidelines for Reviewers
The giants, but also some tiny giant, are used to proposing guidelines to media/influencers. And, although it is easy to understand how not following them can lead to repercussions, we repeat that the guidelines are proposed and not imposed. These guidelines are presented in different ways and formats, sometimes personal and sometimes addressed to selected/trusted publications. Here we are no longer just talking about CPUs or GPUs or smartphones, it applies to any product.The Friendly Advice
It is an email, longer than usual, in which the friendly PR person advises how to get the most out of their new product, inviting you to extend these suggestions to your users. It is common, rarely really intrusive, and these emails happen to everyone, even us. They are not real guidelines but simple "PRring' concerning the main characteristics of the product. In short, more than legitimate stuff.
The sincere recommendation
Organised ones
In very rare cases, this leads to an actual to-do list, attached as a Word or PDF document. For some products, the list is accompanied by other information material, where some points are explained step by step. In short you are free to test but in the way you are told to do it. This pressure is exerted on sponsored influencers through very detailed contracts that include penalties if they breach an embargo or talk about this confidential material. They, too, are free to follow the list or not. To publish the review or not. In the end is always a personal choice...
What can influence a test?
Lots of things! Reviewing a CPU without disabling MultiCore Enhancement and the similar means already doping it. Also for CPUs, remember that they are worth very little in game benchmarks when you turn up the graphics settings to the maximum. A video of how a smartphone works, taken just after a reset, is not representative of its actual fluidity. Photographs with smartphones - when published - should always be done with 'point and shoot', without adjusting the settings. The stability of any hardware component cannot be judged simply by running benchmarks. And it cannot be based on a few hours of use anyway. The same applies to notebooks. The power supply used, of which there is little talk, influences performance. At certain junctures, the type of heatsink used or the way the thermal paste is applied also count on that #zero-point benchmark.Tests that are too technical are not reliable
SSDs, the toughest to test
About power supplies, it would be enough to talk about stability and then help the reader to consult the label, after checking it. Yet they seem to be the most difficult products to review. In contrast, reviews of SSDs appear everywhere, with the usual 2-3 benchmarks. SSDs and HDDs contain our data, so they should be the best-reviewed components ever. How reliable are they? Tried to answer this question TechReport, in 2015, and goose bumps while reading are guaranteed.Finally, it is best not to forget that in normal tests, a 500 euro SSD will reach staggering numbers. On a practical level, however, there would be little difference compared to a normal 100 euro SSD.
Temperatures
Playing with temperatures is another easy way to influence readers. For the last ten years or so, infrared thermometers have been all the rage. They are usually the ten euro ones, every now and then you see a 50 euro one. The point is that they could also be 200 and would be equally useless if not calibrated for the type of surface. Surfaces inside a PC are the worst to measure with IR, probes must be used. And even that type of thermometer should be bought of good quality and calibrated, not used as it comes. Showing useless equipment only serves one purpose: deceive readers and inexperienced PR people.Monitor
Monitors are another big hassle for those who have to review them. Actually, again, it is not at all difficult to do this for the consumer market and gamers, much more complicated for those who work with graphics. But there are those who want the complex tests, the usual little numbers and graphs, so something very simple turns into a lot of poorly executed tests and bad results. The only site that did thorough testing correctly was Xbitlabs. Which in fact closed. No sponsors for those who work well!The rest of it
Cameras, televisions, routers, keyboards... Almost every category has its slew of botched technical tests. What can I say? When there's a few thousand euros to be invested in a professional device, it may be worthwhile to engage in extensive testing, and even seek out someone who does it properly. For other products, we come back to the marketing and #zero-point discourse. Conditions inside homes are not the same as in a laboratory, a garage or an office. They are not real conditions in real scenarios. And no editor/influencer is able to really test all variables. They sell smoke.Products selected for reviewers
Back to talking about influences, this time indirectly, ever since the first reviews came out in newspapers and magazines, there has been a habit of only submitting selected products to the reviewers' judgement. Exclusives. It happens with all products that require particularly painstaking quality control. From cars to PC hardware to even medical equipment offered by representatives to facilities. But when it comes to die, there is a further selection phase, which affects all consumers.Binning
Fixed and rated
Those that fail to perform certain operations are, if possible, repaired. In fact, there may be 'spare components' on the rectangles that can be unlocked in the event of faults. At this point, after the tests are finished, the classification. Units with too many errors and those below the minimum standard end up in the foundry. The others are divided according to performance. Practically the whole slice would be made up of the exact same dies that are supposed to have the same performance, but there is a high error rate and only a few rectangles come out perfect.
Marketing
When you read that a new graphics card has the same GPU as a higher-end model but with some features removed, it is nothing more than an imperfect GPU saved from the foundry. And, no, there is no way to make it perform as well as the luckier twin. The same goes for CPUs: the architecture may be the same, but if one model is sold at 100 and another at 600, it means that the 100 model came out pretty bad and it's not worth forcing it. Another reason why it is good to think about it several times before launching with theoverclock. ;)
Silicon lottery
Yet, more or less directly, it is the manufacturers themselves who push the overclocking button. They do so demanding the publication of the appropriate tests. Explaining to reviewers that 'it's the high-end model but lower clocked, maybe up it becomes the same... *blink-blink*'. They do this blatantly, with the high-end products, the perfect rectangles, selling them as already overclocked, without blocks, etc. The consumer is thus directed to buy something that *risks* being better than the price paid. Risk, in fact, because it is by no means said.Perfectly imperfect
The silicon lottery is so called because all rectangles are different from each other. No manufacturer makes a further selection to ensure high 'overclockability'. In theory, the best parts should also be the most stable under high overclocks, but no one, without testing them individually and specifically for this purpose, can say how much. Usually one should recommend fit the specifications under which the product is sold, at least for the first two years of warranty and possible extensions. Instead, the general advice is to risk and gamble, then spend - in fact - more. Until a few years ago, people even went so far as to reactivate cores disabled by manufacturers because they had failed the tests. Today, it is impossible to do so because they are physically disabled (the same goes for computing units). Yes, there was a time when forum gurus and influencers would say: buy this model so you can activate the failed cores (and flush them).
Partners of manufacturers are not immune
While fixes are generally made by manufacturers, binning is done by partners. That is, those who are going to mount the blessed rectangles on their own products. Thousands of rectangles are spread out on trays and travelled to the factories. There they will be placed on top of the printed circuit boards and tested, a hundred or so at a time, and then catalogued. Video card manufacturers are also crossing their fingers when they receive the trays from Nvidia or AMD, because from the binning they will know the default clocks. The overclocking capability, on the other hand, they will only know once they have assembled the actual video card. That is, if they decide to test them one by one, which they do not. The famous factory overclock is the reasonable one, the one that - according to their data - the little rectangles in that range should be able to handle without melting anything within the 2-3 year warranty period. If the board melts after the three years: so much the better!
Partners of manufacturers are not immune /2
Beware: dies discarded by one brand can be used by another! The trays with the scraps go back to the producer who, after issuing a partial refund to the partner, sells them again to someone with lower demands. Some manufacturers, to protect their prestige, at this point remove their own brand from the products and the brand of the new buyer appears. Others leave it on but charge a little more. That is why some cheap chinooks have components of a certain prestige but do not yield anywhere near as much as higher-end products!
Cherry-picking
Engineering samples
We could call them almost-prototypes. They are the result of pre-production and the recipients are usually the brand's partner companies who use the product as a reference model to start working on it. It is very rare that these products end up in the hands of the press or are exhibited at trade fairs. Sometimes, and in very limited quantities, they are sent to major influencers to get an initial opinion, especially in terms of design. When this happens, it is usually PC cases, headsets or mice (but also perfumes, clothing, etc. if we want to get out of the technology sphere). When it comes to processors and video cards, it means that real production is already underway. Something can be found on eBay by searching for 'brand_name + engineering sample', bearing in mind that they are less stable and out of any kind of warranty.
Press samples
These are usually identical products to those in the shops, which, at best, have different packaging. Perhaps they lack accessories or are sent in a spartan manner. For some brands, the sending of samples is a kind of sponsorship, which may stop if there are negative reviews or if the guidelines are not followed. Many 'used, as new' for sale on eBay and similar platforms belong to this category. These products are not subject to any kind of selection.
Golden samples
Golden samples are perfect products. They are verified several times and may also have different, more lush packaging. Gifted are sent to influencers whose opinion matters a lot in a given field, (almost) regardless of the number of followers on social media. For example: a youtuber with 2 million subscribers, who is more or less into all things technology, might receive a regular print sample of a new pair of headphones. Another youtuber, with 'only' 500,000 subscribers but targeting an audiophile audience, will instead receive the golden sample (and a frame for the guidelines ;) ).
Golden samples /2
Last year, the case of some SuperNova B3 power supplies branded EVGA and manufactured by Super Flower caused a stir. It emerged, through the voice of the protagonists themselves, that EVGA had sent jonnyguru.com, the global benchmark for power supply reviews, golden samples, free of any problems. Like everyone does The problem (theirs) is that more and more websites, just like RecensioniVere does, are no longer going after the games of some lousy PR and are buying the products for review on their own. To EVGA's (and Super Flower's) misfortune, it was an industry information giant that bought the power supplies: Tom’s Hardware. Not only failed TH's tests, but the power supplies showed serious safety flaws, complete with sparks! And when TH asked EVGA for samples to repeat the tests, the answer was, needless to say: NO. And this is not the first time EVGA's products play with fire, they had problems with the GTX 1070 and 1080 in 2016.
Welcome to the real world!



Between guidelines and limited samples
If the submitted product must be tested under optimal conditions for a successful review. If, indeed, this product may also be different from the one on the market. So what are we going to read? What benchmarks are we talking about? What is the credibility of the test results? For us, as will have happened to many readers, the 100 euro office chairs broke. On Youtube it is full of people enthusiastic about theirs. Do we live in a parallel world? Years ago we reviewed a Gigabyte with noticeable design problems that we could only highlight because it was a purchase and not some kind of PR bribe.Between real manufacturers, stickers and cuts
Outsourcing is a problem. We see one brand, but the producer is often another. We search for information on the web, and sometimes this is available, but the external partner may in turn commission someone else. Then the trail is lost. Often rumours are spread that a product with a certain initials, or a different 'made in', is of higher quality. Just as often, those rumours are absolutely true. Chips, batteries, capacitors, printed circuit boards, every single component of the devices we own may have been sub-subcontracted. And the more 'subs', the lower the costs, the worse the result in terms of efficiency, stability and durability. It would be time to have a law obliging brands to declare this. The limited samples sometimes are not assembled in factories but in workshopsand then in the shop we buy the sub-sub-sub-sub...What can we do? We are old...
The most important thing, when crunching benchmark numbers, would be to understand what you are reviewing: the motherboard? The CPU? A game? And try to focus on that.For us RV is an hobbywe don't make money and we don't lose money. Ten years ago there were very few industry websites in Italy and a couple of influencers. Today there are many more but the quality has deteriorated in the same proportion. Among industry editors we sometimes talk, we exchange emails, and some of the causes of the deterioration are reported in the article. Having set up an equal relationship with the reader, as a conversation between friends, between peers, has raised the average age of the site, and somehow we have escaped the anxieties of youthful targets. We will remain oldies among oldies, which is fine with us. ;)
P.S. The power supply brand Seasonicin order to be above suspicion, it started shipping its samples through Amazon. This seems to us a good starting point, looking forward to seeing it do this to non-sponsors as well.