FYI, I posted this today at BCT - thought I'd share it here as well:
I found it incredibly unusual that there wasn't an official or unofficial support thread for us Titan owners here, especially given there's no real official method of sending a rig for repair and receiving it back with certainty. My RMA process was fraught with pitfalls - so much so that in the end I decided it best to just do what I could with my cubes that were misbehaving rather than risk sending them back to KnC for repair, and possibly being charged/getting lost or stolen or broken/never getting them back/getting them back in time for the next millenium... you get the idea. I've spent an inordinate amount of time coming up with a tuning procedure that works 100% of the time with my rigs that I've wanted to share for a long time - and I'd like to know what others have done to overcome bad controllers/cables/PSU issues, etc.
So I'll kick off this thread with my tuning procedure. As anyone with one of these rigs knows, they are as fidgety as a blind man in a shooting gallery. The dies, over time, go dark or just inactive and no amount of rebooting (cold or warm) will bring them back. When this happens...
Clock Speed Phase
- Do a cold-boot as you normally would to recover an iffy die.
- Do a factory-reset on your rig (FWIW, it seems like clearing the saved settings and history of a cubes settings possibly get them hashing again).
- After the reset, put in a pool you know to give you a consistent hashrate like coinotron and let the rig settle out.
- Note the dies that aren't initializing or hashing - leave your voltages alone during this phase. Set every non-initializing/hashing die to 150Mhz and warm-reboot. Consider dies hashing properly at 300Mhz (stock speed) to be locked in.
- Again, let the rig settle in. Any die that has recovered at 150Mhz and is hashing, turn up one setting to 175Mhz. The ones that don't, set them to 100Mhz and warm-reboot.
- Settle. Any dies at 175 that were previously hashing and aren't? Set those back to 150Mhz and consider them locked in. Ones that are still hashing at 175Mhz, tick them up to 200Mhz. Any dies still inactive at 100Mhz? Shut those OFF - they aren't really worth working on further. They won't recover without other means.
- Settle. Any dies hashing at 200Mhz? If so, tick those up a slot again. Not hashing? Drop them back to 175Mhz and consider them locked in.
- Settle. Continue this until all dies are locked in and hashing. The clock speed phase is over.
Power Tuning Phase - After all dies are locked into their highest possible frequency, now's the time to rune back the core-voltage to minimize power consumption and heat (a bit).
- The stock V of -0.0366 is fine for OVERCLOCKED dies. That is, those that you push up to 325Mhz. Most all of my 300Mhz dies run at -0.0513 with minimal (<1% HW errors).
- As a matter of principle, you can undervolt two steps for every 25Mhz clock speed drop. Meaning, if a die will hash at 200Mhz, a setting of -0.1099 will likely supply enough V to maintain the hashing and keep errors low. This will also mitigate supplying power the die doesn't need to hash and tick down the heat generated by the die as well. Win/Win.
- How do you know about HW errors? SSH to the rig's IP address and issue a 'screen -r' command at the prompt and look over the last info next to each cube 'HW: XX/XX%" - you want the percentage to be below 1%. Tick up any cube's dies one step of core-V at a time and restart mining to verify until the dies are getting enough power.
That's all the tough stuff. If you do this right, you'll be rewarded with a much more efficient rig, both power and temp-wise, with more working dies. Even a die hashing at 100Mhz is an improvement over a non-hashing one at 300Mhz. Pay particular attention to temps after you are done. Tune 'hot' cubes and dies down to fit a temp envelope of lower than 105 degrees C to keep things going smoothly. Yes, many say the cooler the better, however, I have cubes and dies that won't hash at all when they are cool (75 degrees C) but hash fine at 100 degrees.
Always perform a visual inspection of cables, connectors and PSUs at least once or twice a week to note discoloration or melting cables and replace them as necessary, immediately. As an added measure, note the cube's number on the rig itself, and tune that cube down a bit (Mhz-wise) to let the cube draw less current over those cables - they'll last longer that way. I know there are many here that run their cubes balls-out at 325 and full V all day and all night long - but unless you are running 12 gauge PSU cables, there will be a come-uppance for you and your rigs (and possibly your house).
I'm sure many of you have your own procedures, but these have served me well (so far) - and I hope others join this thread and add their own experiences and wisdom. I'm definitely looking for someone to chime in on how to remove the heatsinks and re-do their thermal paste applications on these things. I have some dies myself that run 10-20 degrees hotter than others, even in the same cube, and would love to be walked through removal of the heatsink so as not to damage the cube altogether.
All we have is each other - let's share experiences and info.
Donations Welcome: 163fDhK9sNwL7fdjWK6QZ6gNYayYDRWGFn