Earlier I was able to write to all 512 (or less buffers) from PRU but they were not being added back to the vring (as free buffers) after the data was consumed by host processor. This last roadblock (hopefully) has been solved hence resulting in successful transfers from PRU to ARM, even continuous streaming data using the vring :D
One major optimization remains i.e I am able to get different transfer rates depending on what frequency one kicks (ie interrupts) the ARM. Kicking the ARM after writing to each 512byte buffer followed by a kick results in stalling the ARM because of too many interrupts within too short a time. Kicking after filling all 512 buffers have been filled in bu the PRU leads it to wait for ARM to consume
buffers first before PRU can start using them again. So, in order to attain maximum throughput, an optimum value needs to be found out after which the ARM is kicked. The user also has freedom to decide this value on his own (in the pru firmware) depending upon his requirements.
Patches for 4.1 expose misc char device to stream data to user.