CUDA memset in Compute Visual Profiler -


i use compute visual profiler measure performance of cuda programs.

the result of profiler shows 2 different results cudamemset function.

  1. memset32_post
  2. memset128

i want know difference between these 2?

screenshot

i guess memset128 kernel bulk of work , memset32_post kernel cleans remainder since used size not multiple of 128.

there's nothing worry about, it's trying implement memset in efficient manner possible, although i'd try avoid memset in inner-loop (on processor). if you're worried over-allocate.


Comments

Popular posts from this blog

jasper reports - Fixed header in Excel using JasperReports -

media player - Android: mediaplayer went away with unhandled events -

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -