Most commercial companies claim that recognition software can achieve between 98% to 99% accuracy (getting one to two words out of one hundred wrong) if operated under optimal conditions. These optimal conditions usually means the test subjects have 1) matching speaker characteristics with the training data, 2) proper speaker adaptation, and 3) clean environment (e.g. office space). (This explains why some users, especially accented, might actually find that the recognition rate could be perceptually much lower than the expected 98% to 99%).
Other, limited vocabulary, systems requiring no training can recognize a small number of words (for instance, the ten digits) from most speakers. Such systems are popular for routing incoming phone calls to their destinations in large organizations.