Designing an Instruction Set Architecture is fine. Making one that won't shoot you in the feet later is something else !
A nice and efficient architecture must not only run fast and be easy to program but it must also be easy to fully save and restore, for example for debug purposes, interrupt handling, virtualisation and context switching (with a preemptive kernel).
If you want to see an example of what NOT to do, look at the PIC16F architecture. Interrupt Service Routines are tricky and most importantly, the hardware stack is a P.I.T.A. (I managed to make #PIC16F/OS but it is limited to a fully cooperative system)
Some architectures provide two or more banks of registers to reduce latency, though often it just postpones the backup process. F-CPU implemented the SRB (Smooth Register Backup) to address this issue.
But what really kills is state you can't save and restore. Often, status registers are tricky, because of all the side effects. Don't forget the coprocessors either. But some "amart ideas" and "tricks" end up being kludges, such as the MSP430's Hardware multiplier, which is a peripheral where not all registers can be written (so you can't easily save and restore the state).
This is a strong argument FOR a unified register set and orthogonal ISA, as well as minimising the number of status flags and registers.