下記、CUDA-Zの結果。
CUDA-Z Report
Version: 0.9.231 http://cuda-z.sf.net/OS Version: Windows AMD64 6.1.7601 Service Pack 1
Driver Version: 347.25
Driver Dll Version: 7.0 (8.17.13.4725)
Runtime Dll Version: 6.0
Core Information
| Name | GeForce GTX 960 |
|---|---|
| Compute Capability | 5.2 |
| Clock Rate | 1228 MHz |
| PCI Location | 0:1:0 |
| Multiprocessors | 8 |
| Therds Per Multiproc. | 2048 |
| Warp Size | 32 |
| Regs Per Block | 65536 |
| Threads Per Block | 1024 |
| Threads Dimensions | 1024 x 1024 x 64 |
| Grid Dimensions | 2147483647 x 65535 x 65535 |
| Watchdog Enabled | Yes |
| Integrated GPU | No |
| Concurrent Kernels | Yes |
| Compute Mode | Default |
| Stream Priorities | No |
Memory Information
| Total Global | 2048 MiB |
|---|---|
| Bus Width | 128 bits |
| Clock Rate | 3505 MHz |
| Error Correction | No |
| L2 Cache Size | 48 KiB |
| Shared Per Block | 48 KiB |
| Pitch | 2048 MiB |
| Total Constant | 64 KiB |
| Texture Alignment | 512 B |
| Texture 1D Size | 65536 |
| Texture 2D Size | 65536 x 65536 |
| Texture 3D Size | 4096 x 4096 x 4096 |
| GPU Overlap | Yes |
| Map Host Memory | Yes |
| Unified Addressing | No |
| Async Engine | Yes, Bidirectional |
Performance Information
| Memory Copy | |
|---|---|
| Host Pinned to Device | 2508.83 MiB/s |
| Host Pageable to Device | 1433.22 MiB/s |
| Device to Host Pinned | 1552.14 MiB/s |
| Device to Host Pageable | 1034.73 MiB/s |
| Device to Device | 38.3993 GiB/s |
| GPU Core Performance | |
| Single-precision Float | 2712.13 Gflop/s |
| Double-precision Float | 86.5304 Gflop/s |
| 32-bit Integer | 821.532 Giop/s |
| 24-bit Integer | 602.154 Giop/s |

960を挿したベースのPCがPCIe3に対応していないからか目的のLibSVMはノートPCのCPUの方が速いという残念な結果でした。