What should be initialised by hardware, and how much must be delegated to software ?
The internal clock tree is already a significant resource, which dissipates quite a lot of power, and uses real estate (and quite some of one metal layer). Adding another global signal that will be used infrequently can be a sad waste of resources.
One example is the Alpha 21064 (if I remember correctly) that has no /RESET signal for the register set and many other resources. The firmware had to initialise everything with carefully selected instruction sequences, which are very implementation-dependent.
So basicly you just need to reset PC to the right place (at the beginning of the bootstrap code, which is address 0) and you save a bunch of /RESET lines. The savings are beneficial with more metal area and better routing of other useful signals, eventually better speed or things like that. There are certainly some other signals that are really critical to the chip, such as IO pin directions and states, but much fewer signals than the various high-speed arrays (register set, cache, LRU, TLB, Branch Predictor History, BTB...)
FPGA usually force you to initialise your FF. They have dedicated hardware and routing nets for this purpose. But you usually don't initialise all your SRAM blocks (though some FPGA can do it) so if you aim for high performance in ASIC, save those resources used by /RESET to get actual work done. Try to simulate/emulate your design with the flip-flops initialised to undetermined states like "U" or "X" before removing the /RESET lines. Use latches to save 1/2 of the surface of flip-flops, when you can. Don't get lazy, FPGA are only for prototyping ;-)