This report provides a deep technical analysis of the AppDimens Dynamic library performance, following the SIMD-friendly Batching, Cache Sharding (Padded), and Inlined Hot-Path optimizations.
[!NOTE] Build variants, R8, and how to read the numbers
With code shrinking and R8 enabled on release builds (
minifyEnabled = true), the library’s hot paths can run much faster than in a typical debug APK. Example ranges observed on the project benchmark harness (same device class as elsewhere in this report):
Harness Approx. range (release + minify + R8) Calculation Test (avg) ~82 ns – ~150 ns Microbenchmark (combined / per-cycle metric as reported by the dashboard) ~125 ns – ~155 ns Macrobenchmark (estimated per-item cost under that harness with R8; not the same cell as scroll duration in ms / µs below) ~367 ns – ~380 ns All other tables and figures in this document were captured on debug builds without minify (no R8 shrinking/optimization pass on that variant). Treat debug without minify vs release with minify + R8 as different environments—do not compare cells across those scenarios without this context.
Enabling R8 full mode (
android.enableR8.fullMode=trueingradle.properties) makes optimization more aggressive; keep ProGuard/R8 rules correct when you turn it on. See R8-PROGUARD.md.
The library features a Lock-Free Padded Sharded Cache architecture with an intelligent Fast Bypass Layer.
getBatch() API exposes continuous loops for the JIT/ART to vectorize, reducing overhead per item.ScreenFactors object to prevent cache line invalidations during configuration changes.[!NOTE] Measurement Notice: Hardware metrics below were captured on physical device in a stabilized state.
Measurements captured on physical hardware in a stabilized state.
| Operation Type | Result | Status |
|---|---|---|
| Raw Math (No AR) | 2 ns | Optimal ⚡ |
| Raw Math (With AR) | 45 ns | Standard |
| Cache Hit (Single - No AR) | 5 ns | Fast ⚡ |
| Cache Hit (Single - AR) | 35 ns | Zero-Math 🚀 |
| Batch Resolution (100 items) | 169 ns | Extreme 🏎️ |
| Batch Cached (100 items - AR) | 3,773 ns | Stable ✅ |
| Persistence Load (100 entries) | 0.76 ms | Fast |
| Operation Type | Result | Status | | :— | :— | :— | | Raw Math (Single) | < 1 ns | Optimal | | Raw Math (With AR) | 2 ns | Optimal | | Cache Hit (Single) | 1 ns | Fast ⚡ | | Cache Hit (With AR) | 1 ns | Zero-Math 🚀 | | Batch Resolution (100 items) | 34 ns | Extreme | | Batch Cached (100 items - AR) | 242 ns | Optimized 🏎️ | | Persistence Load | ~0.06 ms | Fast ✅ |
Stress test executed via the new Micro + Macro Benchmark Dashboard. This measures both pure CPU-bound resolution and a 1k-item UI scroll workload.
| Metric | Result | Impact |
|---|---|---|
| Micro Combined Latency (Hot) | ~260 ns | Extreme Efficiency |
| Macro Scroll (1000 items) | ~996 ms | Fluid |
| Est. Cost per item | ~996 µs | Zero Jank for 120 FPS |
| Peak UI Load | Indistinguishable | 0% Jank Detected |
The ~260 ns / ~996 µs figures above are from debug without minify. On release with minify + R8, the same dashboard-style harness reports roughly ~125 ns – ~155 ns (micro combined) and ~367 ns – ~380 ns for the macro per-item estimate under that configuration—see the Build variants, R8 note at the top of this document.
For CalcType values of AUTO, FLUID, PERCENT, and SCALED without Aspect Ratio (applyAspectRatio = false, bit 63 == 0), the entire cache system is intentionally bypassed.
These formulas reduce to a single float multiply:
baseValue × scale. A raw multiply on Snapdragon 888 takes ~2 ns, while the fastest cache lookup (hash + atomic load + branch) takes ~5 ns. The cache would add latency, not reduce it.
This is a deliberate design decision—not a missing feature. The cache provides its full benefit only for Aspect Ratio paths (which require ln(), ~45 ns on hardware in recent captures), where amortizing the 5 ns lookup cost against that compute cost is clearly worthwhile.
| Path | Cost | Cache used? |
|---|---|---|
| SCALED / no AR (most common) | ~2 ns | ❌ Bypass |
| SCALED / with AR | ~45 ns | ✅ Cache hit ~35 ns |
| Cache hit (no AR) | ~5 ns | ✅ |
Consequence for benchmarks: DimenSdp.sdp(), .hdp(), .wdp() without AR always measure raw math performance, not cache performance. Use .sdpa() (or any *a variant) to measure the cache path.
Benchmark numbers reported in this document reflect measurements taken on a specific device (Xiaomi 2107113SG · Snapdragon 888 · Android 14) under controlled conditions. Results will vary based on:
.prof files skip JIT warm-up entirelyignoreMultiWindows modeRecommendation: always benchmark on your specific target device under representative load. The figures in this document are reference points, not guarantees.
graph TD
A[UI / Code Call] --> B{Cache Enabled?}
B -- Yes --> C{Bypass-eligible & No-AR?}
C -- Yes --> D["Fast Math Return (~2ns)"]
C -- No --> E["Inlined Hash Lookup<br/>(Padded Shards)"]
E --> F{Key Match?}
F -- Hit --> G["Return Float (~5-35ns)"]
F -- Miss --> H[Compute Once & Write back]
H --> G
D --> G
Report Updated: 2026-04-03 · AppDimens Dynamic · AppDimens Performance Lab · Snapdragon 888 Physical Hardware