Cuda is a from of parallel computing from NVIdia which offers
processors and threads of C computer power on graphics cards which
sends quite a few CPU's and even mild FPGA processing solutions
potentially crying for their mothers (or so) when it concerns possible
MIPS/FLOPS power.
As it goes with parallel computers, including most moderns
supercomputer farms, the actually achieved effectiveness of the
programs running on them doing some usefull computation work depends on
how well the problems lends itself to parallelisation. And how well the
latency and bandwidth of the parallel grid behave in practice.
I've used
Cuda a bit lately,
not on a desirable I7/GTX260 but on a 3GHz Pentium D with a 512MB
GF9500GT Nvidia graphics card (16x PCIexpress), and since all the
examples
compiled great on the Fedora 10/64bit OS I used for it (gcc 4.3 I
believe) with Cuda 2.1, I thought I'd try to use the new toy a bit and
see what I can do with it, so I recently made a
Cinepaint
plugin with it and have sucessfully experimented with a
cuda extended tcl/tk interpreter.
I wouldn't say a beauty price would be in place, but things can
work, and the specs of even a mild cuda card (compared with a whopping
4 Tera ops/sec and double precision floats from a Tesla rack) are such
that apart from the advantage of a processor seperate from the CPU
running intensive computations and the fun of using powerfull new
technology with other limiting factors than standard CPUS, the actual
raw computing power and not to forget practical memory bandwidth are
quite attractive compared to even really fast PCs, depending on the
type of algorithm being used. And the cards
certainly aren't expensive.
A Test Setup
I want to try out a standard setup of an audio and midi
input and audio
output to a Linux program (on Fedora) with Jack and Alsa, and see if
with bearable latency the audio data can be transfered from and to the
Cuda processors on the graphics card.
I made a test program with a cuda routine called together with the
audio frame processing, which is an example cuda routine kernel + IO of
arrays, and it takes about 1.3mS to communicate and execute, so should
run well in conjunction with a 256 sample buffer at 44.1kHz.
In practice the program runs, and I'm working on linking the cuda part
in the audio loop. First I want it to not skip frames more than very
few times only, by making the CPU audio loop empty and working on the
way the various procedures are called (maybe a jack library compiled by
nvcc) . My currentl method can work reasonable with a select() call, I
think, but even then it probably isn't really up to much multitasking:
it skips frames every now and then, though maybe I can get it to
behave, I spent one night only, and didn't do much special (yet).
I've prepared including the whole bwise blocks with maxima formulas to
compiled program to be used also with cuda. A working fortran to C
converter which I can run can probably work with Cuda, and that seems
liek a nice idea. In principle it could even run on a webserver! For
instance to generate audio waves from formulas or block connections in
a short time, and make good graphics from them, by parallel processing.
Purpose
Probably I'll want to make my formula rendering
software work on cuda or anything that would seem reasonble to take
advantage of the not soo much chache troubled and quite powerfull
parallel processor. Including the (closed source) surround reverb I
made already some time ago.
Clearly my string simulator lives almost in a class of it's own
computation intensity wise, and it could even seem the architecture is
a but made for it's type of computations...
Also, power audio graphics can be made, cuda can directly talk to
openGL for high bandwidth and high resultion graphs and interfaces.
Interested People
Probably this won't be an Free and Open Source project
because I don't
believe that's reasonable anymore, since no work, money, or kudaas have
been sent to me, and I'm not working on the subject of making myself
only free and open sourced! I still do believe in the thought, and so
of course free software and open ideas will remain and some new will be
made, but this subject is not.
Scientific interest people may have in it, so I'll
describe in limited but usable detail some ideas I use. Probably people
will not mention me as motivation or effectively as the engineer who
gave them those, and take them without acknowledging me, but they'll
find the whole setup cannot just be coped, it's to complicated for that
and God and homeland, and many free and self-working people and
engineers will catch on and hopefully rock the whole of the patent
suckers anyhow. At least I can look in the mirror every morning and not
dispise myself for being a dreadfull and pityfull human being, and I
continue to believe and support the work and ideas of the FSF and
others (Like Fedora and the people who made the Cuda environment work
for free).