Titan Tuning Primer... Secret Sauce Recipe


#1

FYI, I posted this today at BCT - thought I’d share it here as well:

I found it incredibly unusual that there wasn’t an official or unofficial support thread for us Titan owners here, especially given there’s no real official method of sending a rig for repair and receiving it back with certainty. My RMA process was fraught with pitfalls - so much so that in the end I decided it best to just do what I could with my cubes that were misbehaving rather than risk sending them back to KnC for repair, and possibly being charged/getting lost or stolen or broken/never getting them back/getting them back in time for the next millenium… you get the idea. I’ve spent an inordinate amount of time coming up with a tuning procedure that works 100% of the time with my rigs that I’ve wanted to share for a long time - and I’d like to know what others have done to overcome bad controllers/cables/PSU issues, etc.

So I’ll kick off this thread with my tuning procedure. As anyone with one of these rigs knows, they are as fidgety as a blind man in a shooting gallery. The dies, over time, go dark or just inactive and no amount of rebooting (cold or warm) will bring them back. When this happens…

Clock Speed Phase

  1. Do a cold-boot as you normally would to recover an iffy die.
  2. Do a factory-reset on your rig (FWIW, it seems like clearing the saved settings and history of a cubes settings possibly get them hashing again).
  3. After the reset, put in a pool you know to give you a consistent hashrate like coinotron and let the rig settle out.
  4. Note the dies that aren’t initializing or hashing - leave your voltages alone during this phase. Set every non-initializing/hashing die to 150Mhz and warm-reboot. Consider dies hashing properly at 300Mhz (stock speed) to be locked in.
  5. Again, let the rig settle in. Any die that has recovered at 150Mhz and is hashing, turn up one setting to 175Mhz. The ones that don’t, set them to 100Mhz and warm-reboot.
  6. Settle. Any dies at 175 that were previously hashing and aren’t? Set those back to 150Mhz and consider them locked in. Ones that are still hashing at 175Mhz, tick them up to 200Mhz. Any dies still inactive at 100Mhz? Shut those OFF - they aren’t really worth working on further. They won’t recover without other means.
  7. Settle. Any dies hashing at 200Mhz? If so, tick those up a slot again. Not hashing? Drop them back to 175Mhz and consider them locked in.
  8. Settle. Continue this until all dies are locked in and hashing. The clock speed phase is over.

Power Tuning Phase - After all dies are locked into their highest possible frequency, now’s the time to rune back the core-voltage to minimize power consumption and heat (a bit).

  1. The stock V of -0.0366 is fine for OVERCLOCKED dies. That is, those that you push up to 325Mhz. Most all of my 300Mhz dies run at -0.0513 with minimal (<1% HW errors).
  2. As a matter of principle, you can undervolt two steps for every 25Mhz clock speed drop. Meaning, if a die will hash at 200Mhz, a setting of -0.1099 will likely supply enough V to maintain the hashing and keep errors low. This will also mitigate supplying power the die doesn’t need to hash and tick down the heat generated by the die as well. Win/Win.
  3. How do you know about HW errors? SSH to the rig’s IP address and issue a ‘screen -r’ command at the prompt and look over the last info next to each cube 'HW: XX/XX%" - you want the percentage to be below 1%. Tick up any cube’s dies one step of core-V at a time and restart mining to verify until the dies are getting enough power.

That’s all the tough stuff. If you do this right, you’ll be rewarded with a much more efficient rig, both power and temp-wise, with more working dies. Even a die hashing at 100Mhz is an improvement over a non-hashing one at 300Mhz. Pay particular attention to temps after you are done. Tune ‘hot’ cubes and dies down to fit a temp envelope of lower than 105 degrees C to keep things going smoothly. Yes, many say the cooler the better, however, I have cubes and dies that won’t hash at all when they are cool (75 degrees C) but hash fine at 100 degrees.

Always perform a visual inspection of cables, connectors and PSUs at least once or twice a week to note discoloration or melting cables and replace them as necessary, immediately. As an added measure, note the cube’s number on the rig itself, and tune that cube down a bit (Mhz-wise) to let the cube draw less current over those cables - they’ll last longer that way. I know there are many here that run their cubes balls-out at 325 and full V all day and all night long - but unless you are running 12 gauge PSU cables, there will be a come-uppance for you and your rigs (and possibly your house).

I’m sure many of you have your own procedures, but these have served me well (so far) - and I hope others join this thread and add their own experiences and wisdom. I’m definitely looking for someone to chime in on how to remove the heatsinks and re-do their thermal paste applications on these things. I have some dies myself that run 10-20 degrees hotter than others, even in the same cube, and would love to be walked through removal of the heatsink so as not to damage the cube altogether.

All we have is each other - let’s share experiences and info.

Donations Welcome: 163fDhK9sNwL7fdjWK6QZ6gNYayYDRWGFn


#2

Excellent post, thanks for sharing!


#3

Thanks rootdude - I lacked rootdude’s skills when I started tuning but had stumbled on to most of his basic process through what seemed like days of endless but fun trial. Tip on the way as I’ll keep these instructions and pass them on to my hosting provider as well!

I’m interested in how many people upgraded to the Gen Tarkin firmware? I see this firmware as insurance against problems with a hosting provider if they experience any cooling issues. I upgraded and used it to control die temperature while I was tuning and now in production. I’m running at max 75C right now and am getting a an average hash rate of 355Mh, with lots of peaks and valleys, without the need to cold reboot weekly. I’m not seeing the hash rate degrade over time but would like to figure out how to smooth out the hash rate.


#4

You’re welcome. A lot of work (time and effort) went into the process.

This is all about pool-side - not the rig being unruly. When mining straight LTC, the variance in hashrate from a good pool should be +/- 10% hashrate. On switching pools, much more variance can be expected (as much as 40% in my experience).

I run this on all of the rigs, but sometimes it gets in the way of a smoothly running rig, especially when a die is iffy. It’ll reboot the bfgminer instance up to ten times trying to get it to hash - so often I will either shut off the die, or allow it to put the die into ‘bypass’ and leave it alone. You can also manually edit the bypass file via SSH to keep it from rebooting, but this is not available through the GUI. Another helpful feature is the minion GUI to keep an overview of all of your rigs hashing over time. It’s the ‘more goodies’ link from the Status page. You can select ‘multi’ and add the IP address of your other rigs running Tarkin’s firmware.


#5

Thanks for the guidance and have a great Thanksgiving.


#6

You’re very welcome - Happy Holidays to you and yours and thanks very much for the generous tip!


#7

Tuning up cubes for a friend tonight (Christmas present) - the “screen -r” command is a real nice piece of advice.

He bought Seasonic 750 Gold PSUs so I’m tuning as much to the safety limits of the PSU as the cubes themselves.


#8

Nice present. Wanna be my friend? :smile:

That’s a shame… but any PSU in a storm, right?


#9

Sunk cost for him - shame to push the PSUs their max.

Have you been able to get a wireless connection to the RPi working? I’ve tried two different devices and I can’t get wlan0 to show up under iwconfig. I just tried out a Kootek tonight after failing with the Cannisomethingarather. Really frustrating…