Actually, the GPU memory oversubscription might not be so bad, thanks to the fact that CSV is an very inefficient storage format.
Consider: A single-precision float has approximately 7 significant decimal digits and a decimal exponent that goes from -38 to +38. So written in text form, and accounting for the separator, a number will at worst look like this: “x.xxxxxxE-yy,”. That is, 13 characters, which in typical text encodings take up 13 bytes. The equivalent float takes 4 bytes. This means that you can gain up to a factor of 3.25 in space just by loading your data in RAM. Of course, typical CSV export algorithms try to remove trailing zeros and exponents whenever feasible, so in practice you’ll gain a bit less.
If your file uses double precision, another trick that you may use is to do your computation in single precision, if that is enough for your needs. Starting from the textual representation “x.xxxxxxxxxxxxxxxE-zzz,” ( 16 significant decimal digits, decimal exponent from -308 to +308 ), you can then go from up to 23 characters to 4 bytes, which is a best-case memory saving of 5.75, which would get your working set down to about 8.7GB. You’ll get more in practice, because again most CSV export algorithms try to remove some redundant information. But in any case, this is getting very close to what your GPU can handle.
Once you get there, you may want to consider whether you really need to run your computation on that full dataset. If dropping a fraction of it (say, half) is acceptable, you might very well get down to something that fits in your GPU’s memory and leaves you enough space for the actual computation.
[li]Do not take CSV file sizes too literally, as CSV is a very inefficient data storage format. Measure your actual RAM footprint.[/li][li]If your data is double precision and your computation only needs single precision, you might actually get pretty close to your GPU’s memory capacity.[/li][li]If you can safely drop a fraction of your dataset, you can probably get into something that entirely fits in your GPU, and leaves enough space for the computation.[/li][li]Otherwise, you can reduce the impact of round-trips from system memory to GPU memory by overlapping data transfers and compute, as Salabar mentioned.[/li][/ul]