CUDA memset in Compute Visual Profiler -
i use compute visual profiler measure performance of cuda programs.
the result of profiler shows 2 different results cudamemset function.
- memset32_post
- memset128
i want know difference between these 2?
i guess memset128 kernel bulk of work , memset32_post kernel cleans remainder since used size not multiple of 128.
there's nothing worry about, it's trying implement memset in efficient manner possible, although i'd try avoid memset in inner-loop (on processor). if you're worried over-allocate.
Comments
Post a Comment