I have to admit that I highly underestimated this problem too during planning. I've worked with a lot of different MCUs (mostly simple ones though) and a few FPGAs and I didn't really expected that simply running my application code will give me this much headache.
I was expecting something similar for the ZYNQ what I would see for any Xilinx FPGA with a MicroBlaze soft CPU in it. The chip can control configuration for itself, it will load everything from the configuration flash memory and executes the included software binary on the processor.
In case of the ZYNQ, the processor handles everyhing and the FPGA part (PL) is just like any other perfipheral for it, that needs to be configured from software. This in itself wouldn't be much trouble, Xilinx even supplied a wizzard in the SDK that creates a nice so called FSBL (First Stage Boot Loader) that will search the config memory for the FPGA bitstream and application images, checks and loads everything and make life very, very easy...
...in case you have external DDR SDRAM attached. If not well, you're scr*wed. Of course now after a week of struggle the whole thing seems simple. Now I know what modifications needed to be done in the generated FSBL (there are some tech tips on the Xilinx website which gives guidance) to work without external memory, but a few days earlyer I started to become quite desperate.
My current dilemma is that I'd like to do this neat and nice, so that the application itself is a separate image, loaded by the FSBL, this way I could save some valuable on-chip memory (OCM) space and also resue the FSBL for different projects. I could execute the FSBL in execute in place (xip) mode directly from the flash, so that it doesn't get loaded into the OCM, and it can copy the application binary there. However even if executed from the flash, the FSBL still needs to have it's data sections to be in the OCM. In this case how should I partition the most precios 256kBytes of OCM? (.bss + .data for the FSBL is around 75kBytes)
I think I'll just merge the FSBL and the app sw itself to have a single binary that is automatically loaded at startup. It will load the bitsream but does not look for further images on the flash and will not handover execution it will just simply start doing what the app software should. This way I can start writing actual application code tomorrow.