Quantcast
Channel: Floppy Emu – Big Mess o' Wires
Viewing all articles
Browse latest Browse all 166

Manufacturing is Hard

$
0
0

There’s a big difference between building one of something, and making a repeatable process to build 10 of them, or 100. Unfortunately I’m learning that the hard way while I try to get some more Floppy Emu boards ready to sell. If I had any hair, I’d be pulling it out! I never thought this would be so hard.

If you haven’t been following the earlier posts, Floppy Emu is a floppy disk drive emulator for vintage Macintosh computers. I built the first Floppy Emu for my personal use about a year ago, and while the soldering was a little challenging, everything worked once it was done. I posted the design on the BMOW web site, and since then I’d estimate about 10 other people have built their own Floppy Emu boards. Then in October I built two more boards from my remaining parts stock, and sold them on eBay. I tested those thoroughly before I sold them, so I’m confident those boards were working well.

The eBay sale generated lots of interest and requests for more boards, so in late October I created board revision 1.1 in preparation for a small hand-made “production run”. The board layout changed slightly to make room for mounting holes, and some board traces were moved or added. I switched to a different PCB supplier, changed to a different brand of 3.3V LDO regulator, and substituted the Atmega1284 for the Atmega1284P to save a few pennies.

I built four of the rev 1.1 boards, and initially none of them worked. As described in my previous post, the new brand of 3.3V regulator proved to be unstable when combined with the output capacitor I’d been using. The oscillations on the 3.3V and 5V supply lines caused all kinds of crazy behavior and malfunctions that drove me crazy. I’ve since found that replacing the 10 uF ceramic output capacitor with a 33 uF tantalum solves that particular problem. Yet even with the capacitor fix, one of the boards exhibited occasional random write errors, and I somehow toasted another one during assembly.

Later I discovered a flaw in my CPLD firmware that was shorting the Mac’s PWM drive speed control input to GND. Floppy Emu doesn’t actually use that input, but shorting it to ground is not very nice, and may have damaged the CPLD, the Mac, or both. This only affected the rev 1.1 boards. That firmware flaw is now fixed, hopefully without any permanent damage.

I’ve since built two more of the rev 1.1 boards. One worked fine, but the other showed the same pattern of occasional random write errors. Of the six rev 1.1 boards I’ve built, that means I only have three working boards. Arghh! 50% yield is not good. The random write error is maddening. It doesn’t happen very often, so it’s necessary to do a LOT of testing before I can be confident a particular board does or doesn’t  have this problem. I spent a long time with a lens, an oscilloscope, and a debugger trying to explain what’s going wrong, but failed. My best theories are:

 

Software Bug – Perhaps there’s a problem with the Floppy Emu software, like a timing bug or uninitialized variable, and tiny variations in boards or components cause the bug to appear or disappear. This was my first guess, but if true I would expect a continuous distribution of bugginess across boards, rather than two groups of “working” and “not working” boards. I tested the working boards heavily, and they really do work 100%. I also made many experimental software changes that I thought might cause the problem to appear or disappear, but there was no change in behavior. And to my knowledge none of the rev 1.0 boards have this problem, even though they use the same software.

Soldering Mistake – I may have created a bad solder joint somewhere, leading to flaky behavior. That’s possible, but it seems pretty unlikely I’d make the exact same soldering mistake twice in six boards. And I’ve visually inspected the problem boards carefully with a 10x magnifier, and touched up all the likely problem points with an iron, without any success.

CPLD Damage – Some of the CPLDs might have been damaged by the firmware bug that shorted PWM to GND, resulting in buggy behavior even after the firmware was fixed. That’s certainly possible, but then why weren’t all the CPLDs damaged? Why just two of them? If this is the true explanation, then future rev 1.1 boards should all work OK now that the firmware bug is fixed.

Atmega1284 vs Atmega1284P Variation – Maybe some minor difference between the two types of the AVR microcontroller is causing unexpected problems. As far as I know, the only difference is that the “P” version uses Atmel’s Pico-Power system to enable very low power sleep modes. Since I’m not using those sleep modes, that difference shouldn’t matter.

Board Design Flaw – The rev 1.1 board could contain a design mistake not present in the original board, like substantial coupling between neighboring traces, signal reflections, or other noise that leads to intermittent problems. While the layout changes between rev 1.0 and 1.1 were minor, I can’t rule this possibility out.

Manufacturing Flaw – The rev 1.1 boards from Smart Prototyping might not be built to the same tolerances as the original boards from Dorkbot PDX. In terms of published specs like minimum trace width and spacing, the Smart Prototyping process should be fine, and I used their design rules file to verify my board in Eagle. I know other people have been successful with rev 1.0 boards not made by Dorkbot PDX, though I don’t think any have used Smart Prototyping specifically.

 

Unfortunately I’m at one of those points where I really don’t know where to go next. I could build a few more boards to test the CPLD damage theory. Or get some more Atmega1284P’s and build a few boards with those, or experiment with going back to the original PCB manufacturer or the rev 1.0 board design. But each of those experiments would require more time and money to test the theory. I’d need to see at least five good boards and zero bad ones before I had any confidence that I’d solved the problem. Spread across all the possible problem causes, I could end up building several dozen test boards, and still come up empty-handed if the true cause is a software bug or something else I haven’t considered.


Viewing all articles
Browse latest Browse all 166

Trending Articles