Tag Archives: CUDA

Declaring dependencies with cudaStreamWaitEvent

cudaStreamWaitEvent is a very useful synchronization primitive which takes two arguments as input: a stream, and an event. Even if this not clear from its name, this is a non blocking function, all operations enqueued in the stream after calling cudaStreamWaitEvent will only be unlocked when the event is triggered. A simple example For example, in [...]

Commentaires fermés Continue Reading →

Accessing pinned host memory directly from the device

When it comes to optimizing data transfers, ensuring that we use pinned memory is critical in CUDA. One way to use such pinned memory is to ask CUDA to allocate host memory with the cudaMallocHost function. With the UVA (Unified Virtual Addressing) mechanism added in CUDA 4.0, there is an additional behaviour that is worth [...]

1 Comment Continue Reading →