Basic documentation is here, more verbose information is here.
Sometimes people wonder why their multi-thread application doesn't appear to be using all available cores. This generally happens for two reasons:
I thought it might be illuminating to provide a visual example of this, so I took a simple Java test program I wrote a while back and stuffed it into a trivial SDK-generated Android application, added some custom trace calls, then ran it under systrace.
When you hit the start button, the APK starts four threads, each of which do a trivial computation to occupy the CPU. They finish in about two seconds. If four CPU cores are engaged, the whole test should take about two seconds to run, because the threads will execute in parallel. If only one core is used, the test should take about eight seconds to run.
The traces were captured on a Nexus 4 running Android 4.3 (API 18). This release introduced android.os.Trace, which allows you to create your own tags for systrace output. I used it here to wrap the execution of the spin loops, so you can easily see their behavior in the trace.
Using SDK 22.0.5, the commands I used were:
cd android-sdk-linux/platform-tools/systrace python systrace.py --app=com.faddensoft.multicoretest gfx view sched dalvikThe
--appargument is new in Android 4.3. It tells systrace to show the data generated by the
android.os.Tracecalls for the specified app, identified by the app package name.
The high level view looks like this:
Process 10268 is my application. I've trimmed out nearly all of the other rows, so you can't really see UI activity. The topmost row is the main thread, which has a bit of activity at first when I hit the "start" button on screen. A little bit later the four worker threads are spawned.
Each of the worker threads starts by opening a trace tag, "spin_N". This makes it easy to see the threads in the visualizer. Without this, you would have to pick the threads out of the CPU row by thread ID.
Looking at the wide "spin_0" bar in the lower half of the graphic, you can see where the test starts and ends on that thread relative to all the other activity in the system. Right above (touching) the thick green bar is a thin line that starts out grey on the left and then changes to alternating green/blue. This is the thread status: grey is sleeping, green is running, and lavender is runnable. Each test thread begins by sleeping for 250ms, which is why you see the grey section before the test starts running.
The "runnable" state is the most interesting for us, because it means the thread could be running but isn't scheduled on a CPU.
If we zoom in a bit at a point shortly after the tests have started, we can see that the "CPU" rows show the name of the thread that is executing on that core:
You can see that spin_0, spin_1, spin_2, and spin_3 are all being scheduled on CPU 0 or CPU 1, and CPU 2 and CPU 3 are idle. There's some really skinny bits interspersed (e.g. a thin magenta block between two spin_1 chunks on CPU 0 near 1028 ms); this is other stuff in the system trying to get work done. If you zoomed in further, or clicked on the blocks in the trace viewer, you would get the full details.
You can confirm that the color of the skinny line above the thread rows matches up with the threads being scheduled on the CPUs.
(The labels on the CPU rows are thread names, provided to the kernel by the Dalvik VM, while the labels that would appear on the thread ("10268:") rows are trace tags. I made them the same to make it easy to match them up, but generally they won't be equal. The color of each block is determined by hashing the label string, so everything with the same name has the same color. So in this case, where the trace tag labels aren't visible on the thread rows, you can just match up the colors.)
About 80ms after the test starts, something marvelous happens: the other CPU cores are woken up:
If you look at the skinny lines above the threads in the lower part of the screen, you can see on the left that they're alternating green and lavender (running and runnable) as the scheduler gives them time slices. After the other cores start up, they're mostly just running. If you go back to the first "high level view" image, you can see that the spin_0 thread goes in and out of running/runnable quite frequently while the others are mostly running; this is because other stuff in the system is occasionally scheduled to run, and most of it is happening on CPU 0. Which is also why spin_0 is the last to finish.
If you download the full trace, you can see that the kernel schedules a bit of work on cores 2 and 3 after the test finishes, but then stops using them. By specifying additional trace tags it's possible to gather related information, such as the CPU clock frequency, and observe the power management at work.
It should be noted that the delay between starting a multi-threaded test and waking additional CPU cores varies significantly based on the device and the circumstances. The system needs to (1) decide that there's enough going on to merit waking a core; (2) actually wake the core, which is not instantaneous; (3) migrate the thread.
You can view the full trace (NOTE: might require that you use the Chrome browser) or download the test app sources and try it yourself.
The visualizer is embedded in the HTML output, and changes from release to release, so the current documentation on the Android web site may not match up with the interface on the trace. Basic instructions: use WASD (like an FPS video game) to zoom in/out and move left/right. Click on things with the mouse to get more detail. Click on the timeline at the top to create vertical lines.
Please post all questions to stackoverflow
android tag (and, for systrace-specific questions,
systrace tag). Comments or feature requests for the framework
or CTS tests should be made on
the AOSP bug tracker.