[Dev] NeoScrypt GPU Miner - Public Beta Test

Wolf0

i really hope wolf you are not doing this just for your self…

you would gain more if you release your work, im sure people here would like to collect some bounty for your work to you release latest kernels.

these people here are fair people.

I’m doing this because it’s interesting. Also, SMix being parallelizable hardly matters unless you split it into 3 kernels, which is doable, but idk what the overhead on the kernel launches would be…

T4rQu1N

Quick question, in my bat file, how do I specify different values for say -i or -w so that my two cards (which are different) have different settings?

-i 14,15?

-w 48, 72?

Kind regards,

T4

einkerl

Quick question, in my bat file, how do I specify different values for say -i or -w so that my two cards (which are different) have different settings?

-i 14,15?

-w 48, 72?

Kind regards,

T4

“intensity” : “18,18,18”,
“worksize” : “256,128,256”,

specify in your .conf

ghostlander

neoscrypt_vliw.cl v2

It’s 19.5KH/s now on a HD6970. FastKDF and BLAKE2s have been cleaned up and optimised, memory requirements reduced.

OH MY GOD. I’ve been staring at this code for ages, and it only JUST NOW occurred to me that SMix() is parallelizable. Not the internals of SMix, of course, but the two calls to it…

Yeah, I’ve mentioned this in my white paper. Not sure if it’s of any use for mining.

I’ve never worked on 6xxx, but isn’t shuffle cheap? One shuffle for the permutation, keep it through all the salsa rounds + XOR ops, one shuffle to fix it. Seems like it’d be worth it - of course, unrolling will deliver a lot, probably.

It is, but that’s not what concerns me now. With FastKDF removed, the kernel gets reduced in size by ~60% and outputs 30KH/s.That’s a big overhead, but not critical and I’ve expected more out of ChaCha + Salsa. With ChaCha only enabled, it’s 58KH/s and with Salsa only = 56KH/s. Scalar Salsa isn’t supposed to be about as fast as vectorised ChaCha. It’s clearly scalar because the AMD compiler isn’t really smart and the kernel size is about double of ChaCha only size. Anyway, there is a huge bottleneck somewhere and it needs to be identified.

Wolf0

neoscrypt_vliw.cl v2

It’s 19.5KH/s now on a HD6970. FastKDF and BLAKE2s have been cleaned up and optimised, memory requirements reduced.

Yeah, I’ve mentioned this in my white paper. Not sure if it’s of any use for mining.

It is, but that’s not what concerns me now. With FastKDF removed, the kernel gets reduced in size by ~60% and outputs 30KH/s.That’s a big overhead, but not critical and I’ve expected more out of ChaCha + Salsa. With ChaCha only enabled, it’s 58KH/s and with Salsa only = 56KH/s. Scalar Salsa isn’t supposed to be about as fast as vectorised ChaCha. It’s clearly scalar because the AMD compiler isn’t really smart and the kernel size is about double of ChaCha only size. Anyway, there is a huge bottleneck somewhere and it needs to be identified.

I have a really hard time reading your style, but the code is pretty good! Don’t you think that bottleneck is waiting for global memory, though?

ghostlander

My 1st guess it runs out of private memory. It takes 512 bytes for block mixing + 800 bytes for FastKDF and BLAKE2s per kernel instance. That’s not including local variables, counters, etc. Scrypt consumes 3 times less private memory. It’s opposite for global memory requirements, so you are not going to exceed them. Although the GCN cards report about the same amounts of local and constant memory (32Kb + 64Kb), they also have 32Kb of L1 cache which may help. Maybe they also have more private space (registers). Global memory is used for V space only. Not much activity there. Everything else runs in private/local space.

Another guess there is something wrong with the miner itself related to scheduling of kernel threads. Increase intensity over 13 and hash rate reduces. Increase it even more and see HW errors. Set to 20 and it hangs up. Scrypt can do 20, but it’s different. Need to start with a clean fork and add the NeoScrypt support myself probably. Have a few other ideas, but they also need work.

Wolf0

My 1st guess it runs out of private memory. It takes 512 bytes for block mixing + 800 bytes for FastKDF and BLAKE2s per kernel instance. That’s not including local variables, counters, etc. Scrypt consumes 3 times less private memory. It’s opposite for global memory requirements, so you are not going to exceed them. Although the GCN cards report about the same amounts of local and constant memory (32Kb + 64Kb), they also have 32Kb of L1 cache which may help. Maybe they also have more private space (registers). Global memory is used for V space only. Not much activity there. Everything else runs in private/local space.

Another guess there is something wrong with the miner itself related to scheduling of kernel threads. Increase intensity over 13 and hash rate reduces. Increase it even more and see HW errors. Set to 20 and it hangs up. Scrypt can do 20, but it’s different. Need to start with a clean fork and add the NeoScrypt support myself probably. Have a few other ideas, but they also need work.

I feel stupid. For some reason, I was thinking of GCN cards while talking about 6xxx. Oops.

Wolf0

Preparing my GCN kernel for public release; cleaning code, removing stuff I tried that really sucked, like completely unrolled chacha/salsa, stuff like that. After that, I’ll package it up with SGMiner and it should be good to go. Should give results like this (NSFW): https://ottrbutt.com/miner/neoscryptwolf-11082014.png

Alpha Wolf

Preparing my GCN kernel for public release; cleaning code, removing stuff I tried that really sucked, like completely unrolled chacha/salsa, stuff like that. After that, I’ll package it up with SGMiner and it should be good to go. Should give results like this (NSFW): https://ottrbutt.com/miner/neoscryptwolf-11082014.png

Those numbers look great, can’t wait to try this. :)

Does the version of SGMiner your building have xIntensity or have you given any thought to using cgminer 3.7.3 Kalroth that has xIntensity for a build?

More info can be found here from that page it states the new SGMIner 4.1 has xintensity and might be a better choose. Personally I like

cgminer better and had better results with it than sgminer so far.

Wolf0

Those numbers look great, can’t wait to try this. :)

Does the version of SGMiner your building have xIntensity or have you given any thought to using cgminer 3.7.3 Kalroth that has xIntensity for a build?

More info can be found here from that page it states the new SGMIner 4.1 has xintensity and might be a better choose. Personally I like

cgminer better and had better results with it than sgminer so far.

Doesn’t matter - kernel can be used with both.

EDIT: It can be used with any CGMiner/SGMiner that has Neoscrypt support, that is.

daimyo

WHOOOOOOOOOOOA!!!

Installed 14.9 drivers and got cgminer 3.8.7

The result:

hashrate jumped from 95 to 135!!! :))) Same temps!!!

Wolf0

WHOOOOOOOOOOOA!!!

Installed 14.9 drivers and got cgminer 3.8.7

The result:

hashrate jumped from 95 to 135!!! :))) Same temps!!!

Was that your first time using my fixed kernel on 14.9?

daimyo

Was that your first time using my fixed kernel on 14.9?

Actually yes… i guess i am being a bit slow on those updates :D Good job! Thanks for your involvement

Wolf0

Actually yes… i guess i am being a bit slow on those updates :D Good job! Thanks for your involvement

No problem; you should be getting more hash soon!

slowhash

No problem; you should be getting more hash soon!

Let me step right up and personally thank you for the development you have done on this.

Post a btc address and I’ll send you a couple satoshi, or post a guncoin address and I’ll send you a couple thousand. ;)

xIIImaL

neoscrypt_vliw.cl v2

It’s 19.5KH/s now on a HD6970. FastKDF and BLAKE2s have been cleaned up and optimised, memory requirements reduced.

Yeah, I’ve mentioned this in my white paper. Not sure if it’s of any use for mining.

It is, but that’s not what concerns me now. With FastKDF removed, the kernel gets reduced in size by ~60% and outputs 30KH/s.That’s a big overhead, but not critical and I’ve expected more out of ChaCha + Salsa. With ChaCha only enabled, it’s 58KH/s and with Salsa only = 56KH/s. Scalar Salsa isn’t supposed to be about as fast as vectorised ChaCha. It’s clearly scalar because the AMD compiler isn’t really smart and the kernel size is about double of ChaCha only size. Anyway, there is a huge bottleneck somewhere and it needs to be identified.

Don’t work for me now. Cards in rig: 6950,6870,5870, miner 3.7.7b. Screen:

insanid

Don’t work for me now. Cards in rig: 6950,6870,5870, miner 3.7.7b. Screen:

Post more information.

For example:

Windows version and if it is 64 or 32-bit

AMD Catalyst drivers

what worksize are you using?

have you set the following environmental vars?

GPU_MAX_ALLOC_PERCENT=100

GPU_USE_SYNC_OBJECTS=1

also, upgrade to 3.7.7c or use sgminer

insanid

I submitted a bug report to sgminer-dev github in regards to the apparent worksize issue in sgminer-dev binaries on Win64.

https://github.com/sgminer-dev/sgminer/issues/394

xIIImaL

Post more information.

For example:

Windows version and if it is 64 or 32-bit

AMD Catalyst drivers

what worksize are you using?

have you set the following environmental vars?

GPU_MAX_ALLOC_PERCENT=100

GPU_USE_SYNC_OBJECTS=1

also, upgrade to 3.7.7c or use sgminer

Yep, i forgot this settings: GPU_MAX_ALLOC_PERCENT=100 , GPU_USE_SYNC_OBJECTS=1… on sgminer5 now works fine, thanks! 24 kh/s on radeon 6870 :))

Wolf0

Let me step right up and personally thank you for the development you have done on this.

Post a btc address and I’ll send you a couple satoshi, or post a guncoin address and I’ll send you a couple thousand. ;)

I will once the release happens; still waiting on a withdrawal.