Write a GPU kernel histogram () that scans all scores and update the histogram. The kernel should use a single histogram stored in the global memory for all threads.
Write a GPU kernel privatized_histogram() that creates a private histogram for each thread block in the shared memory and allows threads in each thread block to updated their private histogram. Once the thread block finish execution, the global histogram is updated.