Contributing to Slim¶
Thanks for considering a contribution. This document covers the most common patterns and the testing/validation discipline used across the codebase.
Setup¶
git clone https://github.com/iamjosephmj/Slim.git
cd Slim
./gradlew :nativekt:assembleDebug :nativekt:testDebugUnitTest
You'll need: - JDK 17+ (Android Studio's bundled JBR works). - Android SDK + NDK 27 (API 36 platform). - For on-device validation: an arm64-v8a Android 12+ device.
What we welcome¶
In rough priority order:
- New encoder helpers in
Arm64.ktcovering ARMv8.2-A or ARMv8.4-A instructions we don't yet have. The pattern is mechanical and self-contained (see §1 below). - Per-vendor bypass tweaks when one of the four bypass tiers fails on a device we haven't tested. The cascade gracefully falls through; PRs that add new fallback paths or per-OEM fixes are great.
Slimcookbook recipes — interesting NEON kernels worth sharing.- Bug fixes with a regression test.
- Documentation improvements — clearer KDoc, better examples, typo fixes.
For larger structural work (encoder restructuring, the V3 compile-time plugin, ARMv7 support), open an issue first to discuss design.
1. Adding an encoder helper¶
This is the most common contribution. The pattern is identical for every instruction:
Step 1: Find the encoding¶
Look up the instruction in the ARM Architecture Reference Manual (DDI 0487), or just disassemble a reference:
$ cat > /tmp/t.s <<'EOF'
.text
.global _start
_start:
fmla v0.4s, v1.4s, v2.4s
EOF
$ clang --target=aarch64-linux-android -c /tmp/t.s -o /tmp/t.o
$ llvm-objdump -d /tmp/t.o
0: 4e22cc20 fmla v0.4s, v1.4s, v2.4s
So fmla v0.4s, v1.4s, v2.4s encodes to 0x4e22cc20.
Step 2: Add the helper¶
In nativekt/src/main/kotlin/io/simdkt/nativekt/engine/Arm64.kt, add
to the appropriate section. For NEON FP 3-register ops, find the
existing fpVec3 template and add:
fun fmla(rd: V, rn: V, rm: V, arr: VArr): Int {
val sz = arr.size and 0b1
return fpVec3(arr.q, 0, sz, 0b11001, rd.n, rn.n, rm.n)
}
For unfamiliar encoding shapes, add a new private bit-pack helper.
Look at the existing helpers (addSubReg, logicalImm, etc.) for
patterns.
Step 3: Write a golden-byte test¶
In nativekt/src/test/kotlin/io/simdkt/nativekt/engine/Arm64Test.kt:
@Test fun fpVector() {
assertEnc(0x4e22d420.toInt(),
Arm64.fadd(Arm64.V0, Arm64.V1, Arm64.V2, Arm64.VArr.S4),
"fadd v0.4s, v1.4s, v2.4s")
// ... your new instruction here
assertEnc(0x4e22cc20.toInt(),
Arm64.fmla(Arm64.V0, Arm64.V1, Arm64.V2, Arm64.VArr.S4),
"fmla v0.4s, v1.4s, v2.4s")
}
For a fresh instruction group, add a new @Test fun ... with all the
relevant variants.
Step 4: Forward to Arm64Emitter¶
In slim/Arm64Emitter.kt, add the auto-emit forwarder:
fun fmla(rd: Arm64.V, rn: Arm64.V, rm: Arm64.V, arr: Arm64.VArr) {
emit(Arm64.fmla(rd, rn, rm, arr))
}
This makes it usable inside slim {} blocks as fmla(V0, V1, V2, S4).
Step 5: Verify¶
If the test fails, the message tells you exactly which bits are wrong:
XOR'ing those gives the bit difference (0x1000 = bit 12), which
points at the field you mis-encoded.
Naming conventions¶
- Match ARM assembly mnemonics.
add,sub,fmla, etc. - For instructions that overload by operand type (register vs.
immediate vs. vector), use Kotlin overload resolution: same name,
different parameter types. The
addfamily inArm64Emitteris the reference example. - Vector-specific names: keep the architectural name (
fmla, notvfmla). - Convention
*W/*Xsuffix when the same op exists on 32 vs. 64-bit registers and overload resolution can't disambiguate (scvtfS,scvtfD).
2. Validating a bypass tweak on a new device¶
If Slim.initialize fails on a device we haven't tested:
- Reproduce on the device with logcat tag filters
nkandnk-jni: - The cascade reports which tier failed. Common patterns:
bypass: meta-reflection failed— tier 1 always fails on API 31+. Expected.bypass: direct failed (NoSuchMethodException)— tier 2 expected to fail on API 36+.bypass: no targetSdk slot took effect— tier 3 failed. Investigate VMRuntime field layout for that ROM.bypass: art::Runtime probe failed— tier 4 failed. Most likelyart::Runtime::instance_isn't exported, or the policy field is past the 8 KB probe window.- For tier 4 failures, dump libart.so's dynsym:
If
adb shell 'cp /apex/com.android.art/lib64/libart.so /data/local/tmp/' adb pull /data/local/tmp/libart.so llvm-nm -D libart.so | grep -i runtime | grep instanceinstance_is missing, we need a different anchor. - Open an issue with the device model, Android version, and the
nklogcat output.
3. Adding a slim cookbook recipe¶
Add to docs/COOKBOOK.md with:
- A clear use case description.
- The kernel code in a
kotlinfenced block. - Notes on assumptions (data alignment, size constraints).
- Performance numbers if you have them.
Don't worry about polishing every recipe to perfection — even a sketch with notes is useful as a starting point for someone else.
4. Code style¶
- Formatting: standard
ktfmtdefaults. Run./gradlew :nativekt:ktlintFormat(when configured) before pushing. - Imports: prefer fully qualified names over wildcards in the library; wildcards are fine in tests.
- Naming: PascalCase classes, camelCase functions/properties, SCREAMING_SNAKE for compile-time constants.
- Visibility:
internalaggressively for anything that's not part of the public API. The high-level (slimpackage) and low-level (nativektpackage) surfaces are both public; everything inengineis internal except where explicitly noted.
5. Commits and PRs¶
- Commit messages: imperative mood ("Add fmla helper", not "Added fmla helper"). One concept per commit.
- PRs: target
main. Include: - What the change does.
- Why (link to issue if applicable).
- Test results:
./gradlew :nativekt:testDebugUnitTestoutput. - On-device verification if the change touches dispatch / bypass.
- Test the demo: even pure encoder PRs benefit from running the app and verifying the benchmark numbers don't regress.
6. Things to avoid¶
- Don't add public types lightly. The "user only writes
slim {}" design philosophy means each new public class costs mindshare. Prefer internal helpers; only promote to public after a real consumer needs it. - Don't bypass the encoder's golden-byte tests. Every helper has one; "trivial" instructions are exactly where mistakes hide.
- Don't break source compatibility on the high-level API. The
Slim/slim()/Floats/Ints/Bytessurface is contractual. Changes there require a version bump and migration notes. - Don't add per-vendor #ifdefs to encoder helpers. ARM64 encoding is universal; there's no Samsung-vs-Pixel difference at the instruction level. If you need vendor-specific behavior, it belongs in the runtime layer (bypass, EP probe), not the encoder.
- Don't reach into ART internals from user code. The public API
intentionally hides
KernelHandle,KernelTemplate, etc. behind theslimpackage. If you find yourself needing them, file an issue — likely the high-level API needs a new affordance.
7. Project structure reference¶
Slim/
├── README.md — top-level (GitHub landing page)
├── LICENSE — Apache 2.0
├── jitpack.yml — JitPack build config
├── mkdocs.yml — docs site config
├── docs/
│ ├── index.md — docs site landing
│ ├── guide/index.md — teaching guide (your first kernel)
│ ├── ARCHITECTURE.md — runtime internals
│ ├── COOKBOOK.md — integration model + kernel recipes
│ └── CONTRIBUTING.md — this file
├── nativekt/ — the library AAR module
│ └── src/
│ ├── main/
│ │ ├── kotlin/io/simdkt/
│ │ │ ├── slim/ — high-level public API
│ │ │ │ ├── Slim.kt
│ │ │ │ ├── Arm64Emitter.kt
│ │ │ │ ├── Floats.kt
│ │ │ │ ├── Ints.kt
│ │ │ │ └── Bytes.kt
│ │ │ └── nativekt/ — lower-level public API
│ │ │ ├── NativeKt.kt
│ │ │ ├── KernelTemplate.kt
│ │ │ ├── KernelHandle.kt
│ │ │ ├── Linker.kt
│ │ │ ├── Coroutines.kt
│ │ │ └── engine/ — internal
│ │ │ ├── Arm64.kt
│ │ │ ├── Asm.kt
│ │ │ ├── MemoryExecutor.kt
│ │ │ └── Trampoline.kt
│ │ └── cpp/
│ │ ├── trampoline.cpp — JNI helpers (libnktrampoline.so)
│ │ └── CMakeLists.txt
│ └── test/ — unit tests
└── app/ — demo app
└── src/main/kotlin/com/example/slim/MainActivity.kt
8. Testing matrix¶
The CI ideal — even if not yet automated — is:
| Layer | What's tested | How |
|---|---|---|
| Encoder | Every helper produces correct bytes | Golden bytes from clang+llvm-objdump, 49 test methods |
| Asm | Forward/backward/conditional branches resolve | Byte-equivalence vs. hand-rolled |
| Linker | Symbol resolution, error cases | Byte-equivalence + error-path tests |
| Bypass | Tier cascade lands on API 36 | On-device, Slim.isReady after init |
| Dispatch | EP hijack returns correct results | On-device, SAXPY/brightness against scalar reference |
| Concurrency | 4 threads × 50 calls, no races | On-device, comparing per-element output |
| Coroutine API | suspend dispatch, cancellation propagation | Unit tests + on-device |
Per-PR, run:
1. ./gradlew :nativekt:testDebugUnitTest (encoder, asm, linker)
2. ./gradlew :app:assembleDebug && adb install (on a real device)
3. Run the demo, verify the benchmark numbers and concurrency status
If a change touches the bypass cascade, validate on at least one Pixel and one Samsung device if available.
Questions¶
Open an issue or start a discussion. The maintainers are happy to clarify design decisions or scope changes before you spend time on something that won't merge.