It had come to my attention that some customers had problems that some of their boards would stop working. When they returned them and I did a postmortem, it was always the LAN8720AI Ethernet PHY that had died.
Eventually I was able to recreate a setup that would reliably kill the LAN8720AI. If I had a wESP32 connected to a particular PoE switch, with the USB programmer connected to a particular laptop, the chip would die when plugging in the Ethernet jack. So I set out to try and determine root cause and find a solution.
Doing some measurements on what exactly happened when plugging in the Ethernet using a battery powered oscilloscope, I could see some spikes occur on the 3.3V supply and ground when the cable was plugged in. My theory was that maybe the charges on the various EMI caps connecting primary to secondary in the PoE switch power supply, the laptop power supply and the wESP32 would redistribute on connection and cause ground bounce that would for a very short time expose the LAN8720AI to out of spec voltages.
Now doing these kinds of measurements, you're never really sure about what you're seeing, what's real and what's just a measurement artifact. So I wasn't really sure, but it was something to go after. I would try to improve supply filtering, add TVSes for ESD protection on the supply and signal pairs, and add filtering in the ground connection between the EMI cap and the circuit ground.
I first changed supply filter caps on an existing board. I tried larger bulk caps, and smaller close-to-the-pin caps for better response to high frequency spikes. Didn't help. I ordered TVSes, and added one to the 3.3V supply and filtered analog PHY supply. No difference.
I decided to create a new, 4-layer board layout with footprint for ESD protection on the Ethernet differential pairs, the supply, and a differential filter that would including filtering ground spikes that might be coming through the PoE supply's EMI cap. I went to 4-layer with a 3.3V power plane adjacent to GND through 0.1 mm prepreg to provide improved HF power supply filtering.
Here's the prototype:
Here's the ESD protection chip I added to the data pairs, the little chip above the wasp:
Aaaaand it didn't help squat. 😠 The PHY died just the same as before.
Now what? Stock was running low, and I really would rather not make another 1000 potentially troublesome boards that might die in some setups. My goal is for the wESP32 to be powerful and reliable, able to take whatever you can throw at it. Dying when connected to certain equipment just won't do. I've only had a few customers complain about it, usually in some situation where there's external power, so it doesn't seem to be a widespread problem. But still, I don't find it acceptable.
I did some investigating, and found that in the years since the wESP32 was first released, the ESP-IDF has come to support more PHY chips. It looks like IDF 4.x added more options, and the ESP32 Arduino core also already supports them. In the mean time, MicroPython has dropped the IDF 3.x based images that supported Ethernet, but support may be coming in the IDF 4.x based images. So, this opened possibilities. IDF, Arduino and MicroPython support are the most important for me, other software is likely to follow as it gets migrated from IDF 3.x to IDF 4.x.
I had tried to avoid changing PHY chips because I don't like forcing customers to have to change their software. It's only a change in the PHY definition, so it's minimal effort, but it's still a change that needs to be made. I had considered it a last resort, but after all this effort and killing tons of chips while testing, I had to finally admit defeat in my efforts to stop the LAN8720AI from dying. I had really tried hard, tried everything I could think of to protect the chip, to no avail.
So I set out to create a prototype with the RTL8201FI Ethernet PHY instead, to see if that one would keep working or if it would die as well. Because at this point, I didn't really know if another PHY would actually solve the problem or suffer the same fate. I redid the 4-layer layout with all the other improvements to give it the best chance of surviving, and created a prototype:
Slowly going through tests, working up to the worst case scenario that would always kill the LAN8720AI, this prototype with the RTL8201FI just kept working! Hurray! 😁
Since then I have done a lot more testing. I have tried it with various networking equipment, externally and/or PoE powered, I have replicated external power scenarios that caused problems for one customer, and nothing I have tried has killed this chip. It's working beautifully and reliably, and is just as fast. I have removed the extra ESD protection, done a lot of testing that way, and it keeps working just fine.
Which is how it should be, of course. Seriously, this isn't rocket science. It's a fully isolated system. I know how to do power supply design, I follow best practices for filtering and decoupling. I even went to 4-layer with a layer stackup optimized for decoupling HF. And nothing I did would save that lousy LAN8720AI. It's just a troublesome chip.
I remember how it had been troublesome in another way. Remember all the trouble I went through to get it to reset correctly? I had to add a cheap micro just to make that work reliably. The RTL8201FI spec doesn't mention any nonsense about needing the oscillator to run before releasing the PHY reset like the other chip did. It's a simple "wait at least 80 us after power is applied before releasing reset, then wait 150 ms before accessing the registers". So, while I had kept most of the board the same for the RTL8201FI prototype, including the PMS150C-559 to do the reset, I decided to try if I could remove the PMS150C-559 and just tie the PHY reset and oscillator enable to the ESP32's EN signal:
And what do you know: it works just fine. So easy.
So, bye bye LAN8720AI, and good riddance. You were difficult and overly sensitive. Hello and welcome RTL8201FI, you seem to be a much better part, not such a snowflake, and ready to do your job without any fuss.