I don't know, I think you can get to a point where good enough is good enough. Wisdom of the crowd is usually a very reliable source, assuming a diverse enough sample population.
If it were only minor balance tweaks, then yeah, some in-game testing would be needed to clarify the quality of the changes. But when it comes to significant changes and reworks, their impacts are usually pretty obvious just on paper.