@@ -13,3 +13,3 @@ # 🏆 Quick Benchmark
		```
		pip install "lapx>=0.6.0"
		pip install -U lapx
		pip install scipy
		@@ -620,3 +620,3 @@ git clone https://github.com/rathaROG/lapx.git

		See more benchmark results on all platforms [here on GitHub](https://github.com/rathaROG/lapx/actions/workflows/benchmark.yaml).
		See newer benchmark results on all platforms [here on GitHub](https://github.com/rathaROG/lapx/actions/workflows/benchmark.yaml).

		@@ -630,3 +630,3 @@ ## 🕵️‍♂️ Other Benchmarks
		```
		pip install "lapx>=0.6.0"
		pip install -U lapx
		pip install scipy
		@@ -642,14 +642,11 @@ git clone https://github.com/rathaROG/lapx.git

		To achieve optimal performance of `lapjvx()` or `lapjv()` in object tracking application, follow the implementation in the current [`benchmark_tracking.py`](https://github.com/rathaROG/lapx/blob/main/.github/test/benchmark_tracking.py) script.
		🆕 `lapx` [v0.7.0](https://github.com/rathaROG/lapx/releases/tag/v0.7.0) introduced [`lapjvs()`](https://github.com/rathaROG/lapx#5-the-new-function-lapjvs), a highly competitive solver. Notably, `lapjvs()` outperforms other solvers in terms of speed when the input cost matrix is square, especially for sizes 5000 and above.

		(See more results on various platforms and architectures [here](https://github.com/rathaROG/lapx/actions/runs/18620330585))
		💡 To achieve optimal performance of `lapjvx()` or `lapjv()` in object tracking application, follow the implementation in the current [`benchmark_tracking.py`](https://github.com/rathaROG/lapx/blob/main/.github/test/benchmark_tracking.py) script.

		👁️ See more results on various platforms and architectures [here](https://github.com/rathaROG/lapx/actions/runs/18668517507).

		<details><summary>Show the results:</summary>

		```
		Microsoft Windows [Version 10.0.26200.6899]
		(c) Microsoft Corporation. All rights reserved.

		D:\DEV\temp\lapx\.github\test>python benchmark_tracking.py

		#################################################################
		@@ -659,25 +656,26 @@ # Benchmark with threshold (cost_limit) = 0.05

		-----------------------------------------------------------------------------------------------------
		Size \| BASELINE SciPy \| LAPX LAPJV-IFT \| LAPX LAPJV \| LAPX LAPJVX \| LAPX LAPJVC
		-----------------------------------------------------------------------------------------------------
		10x10 \| 0.000153s 5th \| 0.000148s ✗ 4th \| 0.000056s ✓ 1st \| 0.000132s ✓ 3rd \| 0.000084s ✓ 2nd
		25x20 \| 0.000071s 5th \| 0.000064s ✗ 4th \| 0.000057s ✓ 2nd \| 0.000057s ✓ 1st \| 0.000061s ✓ 3rd
		50x50 \| 0.000159s 5th \| 0.000106s ✗ 3rd \| 0.000075s ✓ 1st \| 0.000082s ✓ 2nd \| 0.000109s ✓ 4th
		100x150 \| 0.000190s 3rd \| 0.000574s ✗ 4th \| 0.000132s ✓ 1st \| 0.000149s ✓ 2nd \| 0.000747s ✓ 5th
		250x250 \| 0.001269s 4th \| 0.001361s ✗ 5th \| 0.000542s ✓ 2nd \| 0.000519s ✓ 1st \| 0.001181s ✓ 3rd
		550x500 \| 0.003452s 1st \| 0.028483s ✓ 5th \| 0.006140s ✓ 3rd \| 0.005663s ✓ 2nd \| 0.021576s ✓ 4th
		1000x1000 \| 0.024557s 4th \| 0.023403s ✓ 3rd \| 0.008724s ✓ 1st \| 0.013036s ✓ 2nd \| 0.026147s ✓ 5th
		2000x2500 \| 0.037717s 3rd \| 1.823954s ✓ 5th \| 0.016659s ✓ 2nd \| 0.016489s ✓ 1st \| 1.580175s ✓ 4th
		5000x5000 \| 1.047033s 3rd \| 1.628817s ✓ 5th \| 0.736356s ✓ 1st \| 0.766828s ✓ 2nd \| 1.349702s ✓ 4th
		-----------------------------------------------------------------------------------------------------
		-----------------------------------------------------------------------------------------------------------------------
		Size \| BASELINE SciPy \| LAPX LAPJV-IFT \| LAPX LAPJV \| LAPX LAPJVX \| LAPX LAPJVC \| LAPX LAPJVS
		-----------------------------------------------------------------------------------------------------------------------
		10x10 \| 0.000270s 6th \| 0.000086s ✗ 1st \| 0.000117s ✓ 3rd \| 0.000093s ✓ 2nd \| 0.000145s ✓ 4th \| 0.000156s ✓ 5th
		25x20 \| 0.000134s 6th \| 0.000096s ✗ 1st \| 0.000104s ✓ 4th \| 0.000098s ✓ 2nd \| 0.000109s ✓ 5th \| 0.000103s ✓ 3rd
		50x50 \| 0.000216s 6th \| 0.000161s ✗ 4th \| 0.000130s ✓ 2nd \| 0.000135s ✓ 3rd \| 0.000163s ✓ 5th \| 0.000128s ✓ 1st
		100x150 \| 0.000314s 4th \| 0.001181s ✓ 6th \| 0.000307s ✓ 3rd \| 0.000304s ✓ 2nd \| 0.001002s ✓ 5th \| 0.000292s ✓ 1st
		250x250 \| 0.001926s 4th \| 0.002400s ✓ 6th \| 0.001819s ✓ 3rd \| 0.001703s ✓ 2nd \| 0.002221s ✓ 5th \| 0.001585s ✓ 1st
		550x500 \| 0.005211s 1st \| 0.046236s ✓ 6th \| 0.010141s ✓ 4th \| 0.009736s ✓ 3rd \| 0.031337s ✓ 5th \| 0.009591s ✓ 2nd
		1000x1000 \| 0.035298s 4th \| 0.062979s ✓ 6th \| 0.030774s ✓ 3rd \| 0.029720s ✓ 2nd \| 0.037911s ✓ 5th \| 0.014011s ✓ 1st
		2000x2500 \| 0.047353s 4th \| 2.537366s ✓ 6th \| 0.017684s ✓ 1st \| 0.019768s ✓ 2nd \| 2.133186s ✓ 5th \| 0.023504s ✓ 3rd
		5000x5000 \| 1.923870s 5th \| 3.216478s ✓ 6th \| 1.527883s ✓ 3rd \| 1.501829s ✓ 2nd \| 1.720995s ✓ 4th \| 0.879582s ✓ 1st
		-----------------------------------------------------------------------------------------------------------------------

		Note: LAPJV-IFT uses in-function filtering lap.lapjv(cost_limit=thresh).

		🎉 ------------------------ OVERALL RANKING ------------------------ 🎉
		1. LAPX LAPJV : 768.7409 ms \| ✅ \| 🥇x5 🥈x3 🥉x1
		2. LAPX LAPJVX : 802.9538 ms \| ✅ \| 🥇x3 🥈x5 🥉x1
		3. BASELINE SciPy : 1114.6007 ms \| ⭐ \| 🥇x1 🥉x3 🚩x2 🏳️x3
		4. LAPX LAPJVC : 2979.7809 ms \| ✅ \| 🥈x1 🥉x2 🚩x4 🏳️x2
		5. LAPX LAPJV-IFT : 3506.9110 ms \| ⚠️ \| 🥉x2 🚩x3 🏳️x4
		🎉 ------------------------------------------------------------------- 🎉
		🎉 --------------------------- OVERALL RANKING --------------------------- 🎉
		1. LAPX LAPJVS : 928.9522 ms \| ✅ \| 🥇x5 🥈x1 🥉x2 🏳️x1
		2. LAPX LAPJVX : 1563.3861 ms \| ✅ \| 🥈x7 🥉x2
		3. LAPX LAPJV : 1588.9597 ms \| ✅ \| 🥇x1 🥈x1 🥉x5 🚩x2
		4. BASELINE SciPy : 2014.5920 ms \| ⭐ \| 🥇x1 🚩x4 🏳️x1 🥴x3
		5. LAPX LAPJVC : 3927.0696 ms \| ✅ \| 🚩x2 🏳️x7
		6. LAPX LAPJV-IFT : 5866.9837 ms \| ⚠️ \| 🥇x2 🚩x1 🥴x6
		🎉 ------------------------------------------------------------------------- 🎉

		@@ -689,25 +687,26 @@

		-----------------------------------------------------------------------------------------------------
		Size \| BASELINE SciPy \| LAPX LAPJV-IFT \| LAPX LAPJV \| LAPX LAPJVX \| LAPX LAPJVC
		-----------------------------------------------------------------------------------------------------
		10x10 \| 0.000116s 5th \| 0.000042s ✓ 1st \| 0.000048s ✓ 3rd \| 0.000045s ✓ 2nd \| 0.000060s ✓ 4th
		25x20 \| 0.000052s 1st \| 0.000056s ✗ 3rd \| 0.000056s ✓ 4th \| 0.000054s ✓ 2nd \| 0.000062s ✓ 5th
		50x50 \| 0.000105s 5th \| 0.000104s ✗ 4th \| 0.000070s ✓ 1st \| 0.000072s ✓ 2nd \| 0.000091s ✓ 3rd
		100x150 \| 0.000169s 3rd \| 0.000882s ✓ 5th \| 0.000168s ✓ 1st \| 0.000168s ✓ 2nd \| 0.000690s ✓ 4th
		250x250 \| 0.001306s 1st \| 0.007618s ✓ 5th \| 0.002725s ✓ 3rd \| 0.002842s ✓ 4th \| 0.001719s ✓ 2nd
		550x500 \| 0.003593s 1st \| 0.054599s ✓ 5th \| 0.006124s ✓ 2nd \| 0.006191s ✓ 3rd \| 0.023443s ✓ 4th
		1000x1000 \| 0.026108s 3rd \| 0.029221s ✓ 4th \| 0.010913s ✓ 1st \| 0.011607s ✓ 2nd \| 0.031362s ✓ 5th
		2000x2500 \| 0.041879s 3rd \| 1.971637s ✓ 5th \| 0.016502s ✓ 1st \| 0.017959s ✓ 2nd \| 1.622495s ✓ 4th
		5000x5000 \| 1.197406s 3rd \| 1.463887s ✓ 5th \| 0.642493s ✓ 2nd \| 0.638527s ✓ 1st \| 1.317815s ✓ 4th
		-----------------------------------------------------------------------------------------------------
		-----------------------------------------------------------------------------------------------------------------------
		Size \| BASELINE SciPy \| LAPX LAPJV-IFT \| LAPX LAPJV \| LAPX LAPJVX \| LAPX LAPJVC \| LAPX LAPJVS
		-----------------------------------------------------------------------------------------------------------------------
		10x10 \| 0.000181s 6th \| 0.000080s ✗ 1st \| 0.000091s ✓ 4th \| 0.000083s ✓ 2nd \| 0.000101s ✓ 5th \| 0.000084s ✓ 3rd
		25x20 \| 0.000122s 6th \| 0.000092s ✗ 1st \| 0.000100s ✓ 2nd \| 0.000100s ✓ 3rd \| 0.000107s ✓ 5th \| 0.000103s ✓ 4th
		50x50 \| 0.000218s 6th \| 0.000149s ✗ 4th \| 0.000133s ✓ 1st \| 0.000140s ✓ 2nd \| 0.000183s ✓ 5th \| 0.000141s ✓ 3rd
		100x150 \| 0.000350s 4th \| 0.001086s ✓ 5th \| 0.000258s ✓ 1st \| 0.000279s ✓ 3rd \| 0.001142s ✓ 6th \| 0.000273s ✓ 2nd
		250x250 \| 0.001713s 5th \| 0.001953s ✓ 6th \| 0.000978s ✓ 2nd \| 0.000998s ✓ 3rd \| 0.001682s ✓ 4th \| 0.000929s ✓ 1st
		550x500 \| 0.005035s 1st \| 0.113739s ✓ 6th \| 0.010219s ✓ 4th \| 0.010151s ✓ 3rd \| 0.029781s ✓ 5th \| 0.010025s ✓ 2nd
		1000x1000 \| 0.032870s 3rd \| 0.076641s ✓ 6th \| 0.037077s ✓ 5th \| 0.035340s ✓ 4th \| 0.031529s ✓ 1st \| 0.031647s ✓ 2nd
		2000x2500 \| 0.050076s 4th \| 2.552992s ✓ 6th \| 0.017056s ✓ 1st \| 0.020267s ✓ 2nd \| 2.110527s ✓ 5th \| 0.022934s ✓ 3rd
		5000x5000 \| 2.035414s 5th \| 3.376261s ✓ 6th \| 1.640862s ✓ 4th \| 1.622361s ✓ 3rd \| 1.534738s ✓ 2nd \| 0.910615s ✓ 1st
		-----------------------------------------------------------------------------------------------------------------------

		Note: LAPJV-IFT uses in-function filtering lap.lapjv(cost_limit=thresh).

		🎉 ------------------------ OVERALL RANKING ------------------------ 🎉
		1. LAPX LAPJVX : 677.4637 ms \| ✅ \| 🥇x1 🥈x6 🥉x1 🚩x1
		2. LAPX LAPJV : 679.1001 ms \| ✅ \| 🥇x4 🥈x2 🥉x2 🚩x1
		3. BASELINE SciPy : 1270.7361 ms \| ⭐ \| 🥇x3 🥉x4 🏳️x2
		4. LAPX LAPJVC : 2997.7366 ms \| ✅ \| 🥈x1 🥉x1 🚩x5 🏳️x2
		5. LAPX LAPJV-IFT : 3528.0464 ms \| ⚠️ \| 🥇x1 🥉x1 🚩x2 🏳️x5
		🎉 ------------------------------------------------------------------- 🎉
		🎉 --------------------------- OVERALL RANKING --------------------------- 🎉
		1. LAPX LAPJVS : 976.7508 ms \| ✅ \| 🥇x2 🥈x3 🥉x3 🚩x1
		2. LAPX LAPJVX : 1689.7199 ms \| ✅ \| 🥈x3 🥉x5 🚩x1
		3. LAPX LAPJV : 1706.7731 ms \| ✅ \| 🥇x3 🥈x2 🚩x3 🏳️x1
		4. BASELINE SciPy : 2125.9788 ms \| ⭐ \| 🥇x1 🥉x1 🚩x2 🏳️x2 🥴x3
		5. LAPX LAPJVC : 3709.7903 ms \| ✅ \| 🥇x1 🥈x1 🚩x1 🏳️x5 🥴x1
		6. LAPX LAPJV-IFT : 6122.9942 ms \| ⚠️ \| 🥇x2 🚩x1 🏳️x1 🥴x5
		🎉 ------------------------------------------------------------------------- 🎉

		@@ -719,25 +718,26 @@

		-----------------------------------------------------------------------------------------------------
		Size \| BASELINE SciPy \| LAPX LAPJV-IFT \| LAPX LAPJV \| LAPX LAPJVX \| LAPX LAPJVC
		-----------------------------------------------------------------------------------------------------
		10x10 \| 0.000118s 5th \| 0.000049s ✓ 3rd \| 0.000047s ✓ 2nd \| 0.000045s ✓ 1st \| 0.000058s ✓ 4th
		25x20 \| 0.000054s 2nd \| 0.000064s ✓ 5th \| 0.000058s ✓ 3rd \| 0.000054s ✓ 1st \| 0.000061s ✓ 4th
		50x50 \| 0.000092s 3rd \| 0.000101s ✓ 4th \| 0.000081s ✓ 2nd \| 0.000078s ✓ 1st \| 0.000102s ✓ 5th
		100x150 \| 0.000195s 3rd \| 0.000710s ✓ 5th \| 0.000157s ✓ 2nd \| 0.000147s ✓ 1st \| 0.000647s ✓ 4th
		250x250 \| 0.001387s 4th \| 0.001640s ✓ 5th \| 0.000840s ✓ 2nd \| 0.000802s ✓ 1st \| 0.001287s ✓ 3rd
		550x500 \| 0.004195s 1st \| 0.095603s ✓ 5th \| 0.006292s ✓ 3rd \| 0.006094s ✓ 2nd \| 0.022542s ✓ 4th
		1000x1000 \| 0.024699s 3rd \| 0.037791s ✓ 5th \| 0.017332s ✓ 1st \| 0.017360s ✓ 2nd \| 0.030512s ✓ 4th
		2000x2500 \| 0.038131s 3rd \| 1.946517s ✓ 5th \| 0.016853s ✓ 2nd \| 0.016679s ✓ 1st \| 1.694409s ✓ 4th
		5000x5000 \| 1.132641s 3rd \| 1.679415s ✓ 5th \| 0.771842s ✓ 2nd \| 0.724689s ✓ 1st \| 1.162723s ✓ 4th
		-----------------------------------------------------------------------------------------------------
		-----------------------------------------------------------------------------------------------------------------------
		Size \| BASELINE SciPy \| LAPX LAPJV-IFT \| LAPX LAPJV \| LAPX LAPJVX \| LAPX LAPJVC \| LAPX LAPJVS
		-----------------------------------------------------------------------------------------------------------------------
		10x10 \| 0.000167s 6th \| 0.000119s ✓ 5th \| 0.000092s ✓ 2nd \| 0.000093s ✓ 3rd \| 0.000105s ✓ 4th \| 0.000085s ✓ 1st
		25x20 \| 0.000105s 6th \| 0.000097s ✓ 3rd \| 0.000096s ✓ 2nd \| 0.000096s ✓ 1st \| 0.000102s ✓ 5th \| 0.000099s ✓ 4th
		50x50 \| 0.000192s 5th \| 0.000158s ✓ 3rd \| 0.000142s ✓ 1st \| 0.000150s ✓ 2nd \| 0.000171s ✓ 4th \| 0.000193s ✓ 6th
		100x150 \| 0.000319s 4th \| 0.001089s ✓ 6th \| 0.000302s ✓ 3rd \| 0.000268s ✓ 1st \| 0.001078s ✓ 5th \| 0.000271s ✓ 2nd
		250x250 \| 0.001877s 6th \| 0.001662s ✓ 4th \| 0.000832s ✓ 2nd \| 0.000866s ✓ 3rd \| 0.001686s ✓ 5th \| 0.000810s ✓ 1st
		550x500 \| 0.004962s 1st \| 0.173261s ✓ 6th \| 0.010107s ✓ 4th \| 0.010075s ✓ 3rd \| 0.021147s ✓ 5th \| 0.009892s ✓ 2nd
		1000x1000 \| 0.034665s 5th \| 0.050879s ✓ 6th \| 0.024332s ✓ 3rd \| 0.023485s ✓ 2nd \| 0.030950s ✓ 4th \| 0.021152s ✓ 1st
		2000x2500 \| 0.050928s 4th \| 2.503577s ✓ 6th \| 0.017477s ✓ 1st \| 0.019962s ✓ 2nd \| 2.087273s ✓ 5th \| 0.027349s ✓ 3rd
		5000x5000 \| 2.111693s 5th \| 3.396578s ✓ 6th \| 1.693776s ✓ 4th \| 1.685035s ✓ 3rd \| 1.567221s ✓ 2nd \| 1.058214s ✓ 1st
		-----------------------------------------------------------------------------------------------------------------------

		Note: LAPJV-IFT uses in-function filtering lap.lapjv(cost_limit=thresh).

		🎉 ------------------------ OVERALL RANKING ------------------------ 🎉
		1. LAPX LAPJVX : 765.9472 ms \| ✅ \| 🥇x7 🥈x2
		2. LAPX LAPJV : 813.5027 ms \| ✅ \| 🥇x1 🥈x6 🥉x2
		3. BASELINE SciPy : 1201.5123 ms \| ⭐ \| 🥇x1 🥈x1 🥉x5 🚩x1 🏳️x1
		4. LAPX LAPJVC : 2912.3413 ms \| ✅ \| 🥉x1 🚩x7 🏳️x1
		5. LAPX LAPJV-IFT : 3761.8895 ms \| ✅ \| 🥉x1 🚩x1 🏳️x7
		🎉 ------------------------------------------------------------------- 🎉
		🎉 --------------------------- OVERALL RANKING --------------------------- 🎉
		1. LAPX LAPJVS : 1118.0646 ms \| ✅ \| 🥇x4 🥈x2 🥉x1 🚩x1 🥴x1
		2. LAPX LAPJVX : 1740.0298 ms \| ✅ \| 🥇x2 🥈x3 🥉x4
		3. LAPX LAPJV : 1747.1555 ms \| ✅ \| 🥇x2 🥈x3 🥉x2 🚩x2
		4. BASELINE SciPy : 2204.9078 ms \| ⭐ \| 🥇x1 🚩x2 🏳️x3 🥴x3
		5. LAPX LAPJVC : 3709.7338 ms \| ✅ \| 🥈x1 🚩x3 🏳️x5
		6. LAPX LAPJV-IFT : 6127.4199 ms \| ✅ \| 🥉x2 🚩x1 🏳️x1 🥴x5
		🎉 ------------------------------------------------------------------------- 🎉

		@@ -749,25 +749,26 @@

		-----------------------------------------------------------------------------------------------------
		Size \| BASELINE SciPy \| LAPX LAPJV-IFT \| LAPX LAPJV \| LAPX LAPJVX \| LAPX LAPJVC
		-----------------------------------------------------------------------------------------------------
		10x10 \| 0.000121s 5th \| 0.000046s ✓ 1st \| 0.000051s ✓ 3rd \| 0.000049s ✓ 2nd \| 0.000060s ✓ 4th
		25x20 \| 0.000055s 1st \| 0.000073s ✓ 5th \| 0.000058s ✓ 3rd \| 0.000058s ✓ 2nd \| 0.000072s ✓ 4th
		50x50 \| 0.000104s 4th \| 0.000097s ✓ 3rd \| 0.000076s ✓ 1st \| 0.000088s ✓ 2nd \| 0.000109s ✓ 5th
		100x150 \| 0.000190s 3rd \| 0.000723s ✓ 5th \| 0.000174s ✓ 2nd \| 0.000153s ✓ 1st \| 0.000708s ✓ 4th
		250x250 \| 0.001418s 4th \| 0.001791s ✓ 5th \| 0.000917s ✓ 2nd \| 0.000879s ✓ 1st \| 0.001381s ✓ 3rd
		550x500 \| 0.004009s 1st \| 0.094516s ✓ 5th \| 0.006915s ✓ 2nd \| 0.007350s ✓ 3rd \| 0.025237s ✓ 4th
		1000x1000 \| 0.022408s 2nd \| 0.046482s ✓ 5th \| 0.022091s ✓ 1st \| 0.023886s ✓ 3rd \| 0.030067s ✓ 4th
		2000x2500 \| 0.038188s 3rd \| 1.932233s ✓ 5th \| 0.017298s ✓ 1st \| 0.019071s ✓ 2nd \| 1.627810s ✓ 4th
		5000x5000 \| 1.198616s 3rd \| 1.933270s ✓ 5th \| 0.972903s ✓ 2nd \| 0.925173s ✓ 1st \| 1.355138s ✓ 4th
		-----------------------------------------------------------------------------------------------------
		-----------------------------------------------------------------------------------------------------------------------
		Size \| BASELINE SciPy \| LAPX LAPJV-IFT \| LAPX LAPJV \| LAPX LAPJVX \| LAPX LAPJVC \| LAPX LAPJVS
		-----------------------------------------------------------------------------------------------------------------------
		10x10 \| 0.000168s 6th \| 0.000087s ✓ 1st \| 0.000090s ✓ 2nd \| 0.000118s ✓ 5th \| 0.000104s ✓ 4th \| 0.000092s ✓ 3rd
		25x20 \| 0.000102s 4th \| 0.000113s ✓ 6th \| 0.000099s ✓ 1st \| 0.000099s ✓ 2nd \| 0.000107s ✓ 5th \| 0.000099s ✓ 3rd
		50x50 \| 0.000181s 4th \| 0.000226s ✓ 6th \| 0.000154s ✓ 1st \| 0.000162s ✓ 2nd \| 0.000164s ✓ 3rd \| 0.000210s ✓ 5th
		100x150 \| 0.000321s 4th \| 0.001070s ✓ 5th \| 0.000267s ✓ 2nd \| 0.000265s ✓ 1st \| 0.001108s ✓ 6th \| 0.000267s ✓ 3rd
		250x250 \| 0.001731s 4th \| 0.003008s ✓ 6th \| 0.001673s ✓ 3rd \| 0.001625s ✓ 2nd \| 0.001995s ✓ 5th \| 0.001460s ✓ 1st
		550x500 \| 0.004940s 1st \| 0.168662s ✓ 6th \| 0.009288s ✓ 4th \| 0.009245s ✓ 3rd \| 0.030654s ✓ 5th \| 0.009174s ✓ 2nd
		1000x1000 \| 0.034701s 5th \| 0.051617s ✓ 6th \| 0.024396s ✓ 3rd \| 0.023235s ✓ 2nd \| 0.033910s ✓ 4th \| 0.021512s ✓ 1st
		2000x2500 \| 0.050450s 4th \| 2.519313s ✓ 6th \| 0.017596s ✓ 1st \| 0.018210s ✓ 2nd \| 2.104154s ✓ 5th \| 0.027215s ✓ 3rd
		5000x5000 \| 2.027199s 5th \| 3.501020s ✓ 6th \| 1.753403s ✓ 4th \| 1.732642s ✓ 3rd \| 1.517909s ✓ 2nd \| 0.815372s ✓ 1st
		-----------------------------------------------------------------------------------------------------------------------

		Note: LAPJV-IFT uses in-function filtering lap.lapjv(cost_limit=thresh).

		🎉 ------------------------ OVERALL RANKING ------------------------ 🎉
		1. LAPX LAPJVX : 976.7065 ms \| ✅ \| 🥇x3 🥈x4 🥉x2
		2. LAPX LAPJV : 1020.4816 ms \| ✅ \| 🥇x3 🥈x4 🥉x2
		3. BASELINE SciPy : 1265.1097 ms \| ⭐ \| 🥇x2 🥈x1 🥉x3 🚩x2 🏳️x1
		4. LAPX LAPJVC : 3040.5820 ms \| ✅ \| 🥉x1 🚩x7 🏳️x1
		5. LAPX LAPJV-IFT : 4009.2317 ms \| ✅ \| 🥇x1 🥉x1 🏳️x7
		🎉 ------------------------------------------------------------------- 🎉
		🎉 --------------------------- OVERALL RANKING --------------------------- 🎉
		1. LAPX LAPJVS : 875.4009 ms \| ✅ \| 🥇x3 🥈x1 🥉x4 🏳️x1
		2. LAPX LAPJVX : 1785.6020 ms \| ✅ \| 🥇x1 🥈x5 🥉x2 🏳️x1
		3. LAPX LAPJV : 1806.9635 ms \| ✅ \| 🥇x3 🥈x2 🥉x2 🚩x2
		4. BASELINE SciPy : 2119.7929 ms \| ⭐ \| 🥇x1 🚩x5 🏳️x2 🥴x1
		5. LAPX LAPJVC : 3690.1060 ms \| ✅ \| 🥈x1 🥉x1 🚩x2 🏳️x4 🥴x1
		6. LAPX LAPJV-IFT : 6245.1169 ms \| ✅ \| 🥇x1 🏳️x1 🥴x7
		🎉 ------------------------------------------------------------------------- 🎉

		@@ -779,30 +780,28 @@

		-----------------------------------------------------------------------------------------------------
		Size \| BASELINE SciPy \| LAPX LAPJV-IFT \| LAPX LAPJV \| LAPX LAPJVX \| LAPX LAPJVC
		-----------------------------------------------------------------------------------------------------
		10x10 \| 0.000121s 5th \| 0.000048s ✓ 1st \| 0.000050s ✓ 3rd \| 0.000049s ✓ 2nd \| 0.000062s ✓ 4th
		25x20 \| 0.000058s 1st \| 0.000086s ✓ 5th \| 0.000063s ✓ 3rd \| 0.000060s ✓ 2nd \| 0.000072s ✓ 4th
		50x50 \| 0.000102s 3rd \| 0.000120s ✓ 5th \| 0.000085s ✓ 1st \| 0.000088s ✓ 2nd \| 0.000111s ✓ 4th
		100x150 \| 0.000187s 3rd \| 0.000713s ✓ 4th \| 0.000183s ✓ 2nd \| 0.000154s ✓ 1st \| 0.000719s ✓ 5th
		250x250 \| 0.001286s 4th \| 0.001058s ✓ 3rd \| 0.000481s ✓ 2nd \| 0.000435s ✓ 1st \| 0.001345s ✓ 5th
		550x500 \| 0.004404s 1st \| 0.098839s ✓ 5th \| 0.007206s ✓ 3rd \| 0.006994s ✓ 2nd \| 0.022169s ✓ 4th
		1000x1000 \| 0.025491s 3rd \| 0.028937s ✓ 4th \| 0.013111s ✓ 1st \| 0.013985s ✓ 2nd \| 0.030395s ✓ 5th
		2000x2500 \| 0.039780s 3rd \| 1.999674s ✓ 5th \| 0.018199s ✓ 1st \| 0.020531s ✓ 2nd \| 1.556668s ✓ 4th
		5000x5000 \| 1.142951s 4th \| 1.586818s ✓ 5th \| 0.720062s ✓ 1st \| 0.723589s ✓ 2nd \| 1.141216s ✓ 3rd
		-----------------------------------------------------------------------------------------------------
		-----------------------------------------------------------------------------------------------------------------------
		Size \| BASELINE SciPy \| LAPX LAPJV-IFT \| LAPX LAPJV \| LAPX LAPJVX \| LAPX LAPJVC \| LAPX LAPJVS
		-----------------------------------------------------------------------------------------------------------------------
		10x10 \| 0.000170s 6th \| 0.000079s ✓ 1st \| 0.000092s ✓ 4th \| 0.000085s ✓ 3rd \| 0.000108s ✓ 5th \| 0.000084s ✓ 2nd
		25x20 \| 0.000120s 5th \| 0.000144s ✓ 6th \| 0.000101s ✓ 1st \| 0.000102s ✓ 3rd \| 0.000116s ✓ 4th \| 0.000101s ✓ 2nd
		50x50 \| 0.000185s 6th \| 0.000139s ✓ 4th \| 0.000127s ✓ 1st \| 0.000135s ✓ 3rd \| 0.000158s ✓ 5th \| 0.000135s ✓ 2nd
		100x150 \| 0.000337s 4th \| 0.001089s ✓ 6th \| 0.000264s ✓ 1st \| 0.000296s ✓ 3rd \| 0.001083s ✓ 5th \| 0.000276s ✓ 2nd
		250x250 \| 0.001832s 6th \| 0.001699s ✓ 5th \| 0.000847s ✓ 2nd \| 0.000866s ✓ 3rd \| 0.001471s ✓ 4th \| 0.000813s ✓ 1st
		550x500 \| 0.005429s 1st \| 0.175252s ✓ 6th \| 0.010315s ✓ 4th \| 0.010249s ✓ 2nd \| 0.032756s ✓ 5th \| 0.010292s ✓ 3rd
		1000x1000 \| 0.040797s 5th \| 0.052160s ✓ 6th \| 0.025452s ✓ 3rd \| 0.024602s ✓ 2nd \| 0.036510s ✓ 4th \| 0.021898s ✓ 1st
		2000x2500 \| 0.048694s 4th \| 2.440901s ✓ 6th \| 0.016812s ✓ 1st \| 0.018195s ✓ 2nd \| 2.064631s ✓ 5th \| 0.028164s ✓ 3rd
		5000x5000 \| 2.152508s 5th \| 3.529325s ✓ 6th \| 1.664839s ✓ 4th \| 1.645120s ✓ 3rd \| 1.626812s ✓ 2nd \| 0.897383s ✓ 1st
		-----------------------------------------------------------------------------------------------------------------------

		Note: LAPJV-IFT uses in-function filtering lap.lapjv(cost_limit=thresh).

		🎉 ------------------------ OVERALL RANKING ------------------------ 🎉
		1. LAPX LAPJV : 759.4403 ms \| ✅ \| 🥇x4 🥈x2 🥉x3
		2. LAPX LAPJVX : 765.8850 ms \| ✅ \| 🥇x2 🥈x7
		3. BASELINE SciPy : 1214.3801 ms \| ⭐ \| 🥇x2 🥉x4 🚩x2 🏳️x1
		4. LAPX LAPJVC : 2752.7570 ms \| ✅ \| 🥉x1 🚩x5 🏳️x3
		5. LAPX LAPJV-IFT : 3716.2938 ms \| ✅ \| 🥇x1 🥉x1 🚩x2 🏳️x5
		🎉 ------------------------------------------------------------------- 🎉


		D:\DEV\temp\lapx\.github\test>
		🎉 --------------------------- OVERALL RANKING --------------------------- 🎉
		1. LAPX LAPJVS : 959.1463 ms \| ✅ \| 🥇x3 🥈x4 🥉x2
		2. LAPX LAPJVX : 1699.6506 ms \| ✅ \| 🥈x3 🥉x6
		3. LAPX LAPJV : 1718.8494 ms \| ✅ \| 🥇x4 🥈x1 🥉x1 🚩x3
		4. BASELINE SciPy : 2250.0724 ms \| ⭐ \| 🥇x1 🚩x2 🏳️x3 🥴x3
		5. LAPX LAPJVC : 3763.6443 ms \| ✅ \| 🥈x1 🚩x3 🏳️x5
		6. LAPX LAPJV-IFT : 6200.7878 ms \| ✅ \| 🥇x1 🚩x1 🏳️x1 🥴x6
		🎉 ------------------------------------------------------------------------- 🎉
		```

		</details>

+1

-1

lap/__init__.py

		@@ -34,3 +34,3 @@ # Copyright (c) 2025 Ratha SIV \| MIT License

		__version__ = '0.7.0'
		__version__ = '0.7.1'

		@@ -37,0 +37,0 @@ from .lapmod import lapmod

+40

-56

lap/lapjvs.py

		@@ -6,3 +6,4 @@ # Copyright (c) 2025 Ratha SIV \| MIT License

		from ._lapjvs import lapjvs as _lapjvs_raw
		from ._lapjvs import lapjvs_native as _lapjvs_native
		from ._lapjvs import lapjvs_float32 as _lapjvs_float32

		@@ -15,3 +16,3 @@
		jvx_like: bool = True,
		prefer_float32: bool = False,
		prefer_float32: bool = True,
		):
		@@ -43,20 +44,10 @@ """

		Returns
		-------
		One of:
		- (cost, x, y) if return_cost and not jvx_like
		- (x, y) if not return_cost and not jvx_like
		- (cost, row_indices, col_indices) if return_cost and jvx_like
		- (row_indices, col_indices) if not return_cost and jvx_like

		Where:
		- x: np.ndarray shape (n,), dtype=int. x[r] is assigned column for row r, or -1.
		- y: np.ndarray shape (m,), dtype=int. y[c] is assigned row for column c, or -1.
		- row_indices, col_indices: 1D int arrays of equal length K, listing matched (row, col) pairs.

		Notes
		-----
		- For square inputs without extension, this wraps the raw C function directly and adapts outputs.
		- The solver kernel may run in float32 (when prefer_float32=True) or native float64,
		but the returned total cost is always recomputed from the ORIGINAL input array
		to preserve previous numeric behavior and parity with lapjv/lapjvx.
		- For rectangular inputs, zero-padding exactly models the rectangular LAP.
		"""
		# Keep the original array to compute the final cost from it (preserves previous behavior)
		a = np.asarray(cost)
		@@ -66,8 +57,16 @@ if a.ndim != 2:

		if prefer_float32 and a.dtype != np.float32:
		a = a.astype(np.float32, copy=False)

		n, m = a.shape
		extend = (n != m) if (extend_cost is None) else bool(extend_cost)

		# Choose backend and working dtype for the solver only (ensure contiguity to avoid hidden copies)
		use_float32_kernel = not ((prefer_float32 is False) and (a.dtype == np.float64))
		if use_float32_kernel:
		# Run float32 kernel (casting as needed)
		_kernel = _lapjvs_float32
		work = np.ascontiguousarray(a, dtype=np.float32)
		else:
		# Run native kernel on float64 inputs
		_kernel = _lapjvs_native
		work = np.ascontiguousarray(a, dtype=np.float64)

		def _rows_cols_from_x(x_vec: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
		@@ -82,6 +81,8 @@ if x_vec.size == 0:
		if not extend:
		total_raw, x_raw, y_raw = _lapjvs_raw(a)
		x_raw = np.asarray(x_raw, dtype=np.int64)
		y_raw = np.asarray(y_raw, dtype=np.int64)
		# Square: call solver directly on chosen dtype, but compute total from ORIGINAL array
		x_raw_obj, y_raw_obj = _kernel(work)

		# Convert only what's needed; y conversion deferred if not used
		x_raw = np.asarray(x_raw_obj, dtype=np.int64)

		if jvx_like:
		@@ -95,2 +96,3 @@ rows, cols = _rows_cols_from_x(x_raw)
		else:
		y_raw = np.asarray(y_raw_obj, dtype=np.int64)
		if return_cost:
		@@ -102,6 +104,6 @@ total = float(a[np.arange(n), x_raw].sum()) if n > 0 else 0.0

		# Rectangular: zero-pad to square, solve, then trim back
		# Rectangular: zero-pad to square, solve, then trim back; compute total from ORIGINAL array
		size = max(n, m)
		padded = np.empty((size, size), dtype=a.dtype)
		padded[:n, :m] = a
		padded = np.empty((size, size), dtype=work.dtype)
		padded[:n, :m] = work
		if m < size:
		@@ -112,6 +114,4 @@ padded[:n, m:] = 0

		total_pad, x_pad, y_pad = _lapjvs_raw(padded)
		x_pad = np.asarray(x_pad, dtype=np.int64)
		y_pad = np.asarray(y_pad, dtype=np.int64)

		x_pad_obj, y_pad_obj = _kernel(padded)
		x_pad = np.asarray(x_pad_obj, dtype=np.int64)
		cols_pad_n = x_pad[:n]
		@@ -132,31 +132,15 @@

		# lapjv-like outputs
		tiny_threshold = 32
		if max(n, m) <= tiny_threshold:
		x_out = np.full(n, -1, dtype=np.int64)
		for r in range(n):
		c = int(cols_pad_n[r])
		if 0 <= c < m:
		x_out[r] = c
		# lapjv-like outputs (vectorized)
		x_out = np.full(n, -1, dtype=np.int64)
		mask_r = (cols_pad_n >= 0) & (cols_pad_n < m)
		if mask_r.any():
		x_out[mask_r] = cols_pad_n[mask_r]

		y_out = np.full(m, -1, dtype=np.int64)
		rows_pad_m = y_pad[:m]
		for c in range(m):
		r = int(rows_pad_m[c])
		if 0 <= r < n:
		y_out[c] = r
		else:
		x_out = np.full(n, -1, dtype=np.int64)
		mask_r = (cols_pad_n >= 0) & (cols_pad_n < m)
		if mask_r.any():
		r_idx = np.nonzero(mask_r)[0]
		x_out[r_idx] = cols_pad_n[mask_r]
		y_pad = np.asarray(y_pad_obj, dtype=np.int64)
		rows_pad_m = y_pad[:m]
		y_out = np.full(m, -1, dtype=np.int64)
		mask_c = (rows_pad_m >= 0) & (rows_pad_m < n)
		if mask_c.any():
		y_out[mask_c] = rows_pad_m[mask_c]

		y_out = np.full(m, -1, dtype=np.int64)
		rows_pad_m = y_pad[:m]
		mask_c = (rows_pad_m >= 0) & (rows_pad_m < n)
		if mask_c.any():
		c_idx = np.nonzero(mask_c)[0]
		y_out[c_idx] = rows_pad_m[mask_c]

		if return_cost and n > 0 and m > 0:
		@@ -163,0 +147,0 @@ mask = (x_out >= 0)

+3

-3

lapx.egg-info/PKG-INFO

		Metadata-Version: 2.4
		Name: lapx
		Version: 0.7.0
		Version: 0.7.1
		Summary: Linear assignment problem solvers
		@@ -72,7 +72,7 @@ Home-page: https://github.com/rathaROG/lapx

		`lapx` features the original `lapjv()` and `lapmod()` functions, and since v0.6.0, `lapx` has introduced three additional assignment solvers:
		`lapx` features the original `lapjv()` and `lapmod()` functions, and since [v0.6.0](https://github.com/rathaROG/lapx/releases/tag/v0.6.0), `lapx` has introduced three additional assignment solvers:
		- `lapjvx()` and `lapjvxa()` — enhanced versions of [`lap.lapjv()`](https://github.com/gatagat/lap) with more flexible output formats
		- `lapjvc()` — an enhanced version of Christoph Heindl’s [`lapsolver.solve_dense()`](https://github.com/cheind/py-lapsolver) with unified output formats

		`lapx` v0.7.0 has introduced a new function: `lapjvs()` — an enhanced version of Vadim Markovtsev’s [`lapjv`](https://github.com/src-d/lapjv), supporting both rectangular and square cost matrices, with flexible output styles.
		`lapx` [v0.7.0](https://github.com/rathaROG/lapx/releases/tag/v0.7.0) has introduced a new function: `lapjvs()` — an enhanced version of Vadim Markovtsev’s [`lapjv()`](https://github.com/src-d/lapjv), supporting both rectangular and square cost matrices, with flexible output styles.

		@@ -79,0 +79,0 @@ <details><summary>Read more</summary><br>

+3

-3

PKG-INFO

		Metadata-Version: 2.4
		Name: lapx
		Version: 0.7.0
		Version: 0.7.1
		Summary: Linear assignment problem solvers
		@@ -72,7 +72,7 @@ Home-page: https://github.com/rathaROG/lapx

		`lapx` features the original `lapjv()` and `lapmod()` functions, and since v0.6.0, `lapx` has introduced three additional assignment solvers:
		`lapx` features the original `lapjv()` and `lapmod()` functions, and since [v0.6.0](https://github.com/rathaROG/lapx/releases/tag/v0.6.0), `lapx` has introduced three additional assignment solvers:
		- `lapjvx()` and `lapjvxa()` — enhanced versions of [`lap.lapjv()`](https://github.com/gatagat/lap) with more flexible output formats
		- `lapjvc()` — an enhanced version of Christoph Heindl’s [`lapsolver.solve_dense()`](https://github.com/cheind/py-lapsolver) with unified output formats

		`lapx` v0.7.0 has introduced a new function: `lapjvs()` — an enhanced version of Vadim Markovtsev’s [`lapjv`](https://github.com/src-d/lapjv), supporting both rectangular and square cost matrices, with flexible output styles.
		`lapx` [v0.7.0](https://github.com/rathaROG/lapx/releases/tag/v0.7.0) has introduced a new function: `lapjvs()` — an enhanced version of Vadim Markovtsev’s [`lapjv()`](https://github.com/src-d/lapjv), supporting both rectangular and square cost matrices, with flexible output styles.

		@@ -79,0 +79,0 @@ <details><summary>Read more</summary><br>

+2

-2

README.md

		@@ -21,7 +21,7 @@ <details><summary>🆕 What's new</summary><br>

		`lapx` features the original `lapjv()` and `lapmod()` functions, and since v0.6.0, `lapx` has introduced three additional assignment solvers:
		`lapx` features the original `lapjv()` and `lapmod()` functions, and since [v0.6.0](https://github.com/rathaROG/lapx/releases/tag/v0.6.0), `lapx` has introduced three additional assignment solvers:
		- `lapjvx()` and `lapjvxa()` — enhanced versions of [`lap.lapjv()`](https://github.com/gatagat/lap) with more flexible output formats
		- `lapjvc()` — an enhanced version of Christoph Heindl’s [`lapsolver.solve_dense()`](https://github.com/cheind/py-lapsolver) with unified output formats

		`lapx` v0.7.0 has introduced a new function: `lapjvs()` — an enhanced version of Vadim Markovtsev’s [`lapjv`](https://github.com/src-d/lapjv), supporting both rectangular and square cost matrices, with flexible output styles.
		`lapx` [v0.7.0](https://github.com/rathaROG/lapx/releases/tag/v0.7.0) has introduced a new function: `lapjvs()` — an enhanced version of Vadim Markovtsev’s [`lapjv()`](https://github.com/src-d/lapjv), supporting both rectangular and square cost matrices, with flexible output styles.

		@@ -28,0 +28,0 @@ <details><summary>Read more</summary><br>

+95

-64

src/_lapjvs/lapjvs.cpp

		@@ -10,10 +10,18 @@ #include <functional>
		"This module wraps LAPJVS - Jonker-Volgenant linear sum assignment algorithm (Scalar-only, no AVX2/SIMD).";
		static char lapjvs_docstring[] =
		"Solves the linear sum assignment problem (Scalar-only).";
		static char lapjvs_native_docstring[] =
		"Solves the linear sum assignment problem following the input dtype (float32 or float64). Returns (row_ind, col_ind).";
		static char lapjvs_float32_docstring[] =
		"Solves the linear sum assignment problem in float32 (casts inputs if needed). Returns (row_ind, col_ind).";

		static PyObject py_lapjvs(PyObject self, PyObject args, PyObject kwargs);
		static PyObject py_lapjvs_native(PyObject self, PyObject args, PyObject kwargs);
		static PyObject py_lapjvs_float32(PyObject self, PyObject args, PyObject kwargs);

		static PyMethodDef module_functions[] = {
		{"lapjvs", reinterpret_cast<PyCFunction>(py_lapjvs),
		METH_VARARGS \| METH_KEYWORDS, lapjvs_docstring},
		// Keep a friendly alias: lapjvs follows native dtype by default
		{"lapjvs", reinterpret_cast<PyCFunction>(py_lapjvs_native),
		METH_VARARGS \| METH_KEYWORDS, lapjvs_native_docstring},
		{"lapjvs_native", reinterpret_cast<PyCFunction>(py_lapjvs_native),
		METH_VARARGS \| METH_KEYWORDS, lapjvs_native_docstring},
		{"lapjvs_float32", reinterpret_cast<PyCFunction>(py_lapjvs_float32),
		METH_VARARGS \| METH_KEYWORDS, lapjvs_float32_docstring},
		{NULL, NULL, 0, NULL}
		@@ -64,49 +72,43 @@ };
		template <typename F>
		static always_inline double call_lap(int dim, const void *restrict cost_matrix,
		bool verbose,
		int restrict row_ind, int restrict col_ind,
		void restrict u, void restrict v) {
		double lapcost;
		static always_inline void call_lap(int dim, const void *restrict cost_matrix,
		bool verbose,
		int restrict row_ind, int restrict col_ind,
		void *restrict v) {
		Py_BEGIN_ALLOW_THREADS
		auto cost_matrix_typed = reinterpret_cast<const F*>(cost_matrix);
		auto u_typed = reinterpret_cast<F*>(u);
		auto v_typed = reinterpret_cast<F*>(v);
		if (verbose) {
		lapcost = lapjvs<true>(dim, cost_matrix_typed, row_ind, col_ind, u_typed, v_typed);
		lapjvs<true>(dim, cost_matrix_typed, row_ind, col_ind, v_typed);
		} else {
		lapcost = lapjvs<false>(dim, cost_matrix_typed, row_ind, col_ind, u_typed, v_typed);
		lapjvs<false>(dim, cost_matrix_typed, row_ind, col_ind, v_typed);
		}
		Py_END_ALLOW_THREADS
		return lapcost;
		}

		static PyObject py_lapjvs(PyObject self, PyObject args, PyObject kwargs) {
		// Native dtype entry point: lapjvs_native(cost_matrix, verbose=False)
		// - Accepts only float32 or float64 without casting; errors otherwise.
		// - Dispatches to float or double kernel based on the input dtype.
		// - Returns (row_ind, col_ind).
		static PyObject py_lapjvs_native(PyObject self, PyObject args, PyObject kwargs) {
		PyObject *cost_matrix_obj;
		int verbose = 0;
		int force_doubles = 0;
		int return_original = 0;
		static const char *kwlist[] = {
		"cost_matrix", "verbose", "force_doubles", "return_original", NULL};
		static const char *kwlist[] = {"cost_matrix", "verbose", NULL};
		if (!PyArg_ParseTupleAndKeywords(
		args, kwargs, "O\|pbb", const_cast<char**>(kwlist),
		&cost_matrix_obj, &verbose, &force_doubles, &return_original)) {
		args, kwargs, "O\|p", const_cast<char**>(kwlist),
		&cost_matrix_obj, &verbose)) {
		return NULL;
		}

		// Restore fast default: process as float32 unless force_doubles is set.
		pyarray cost_matrix_array;
		bool float32 = true;
		cost_matrix_array.reset(PyArray_FROM_OTF(
		cost_matrix_obj, NPY_FLOAT32,
		NPY_ARRAY_IN_ARRAY \| (force_doubles ? 0 : NPY_ARRAY_FORCECAST)));
		// Ensure array view; do not cast dtype.
		pyarray cost_matrix_array(PyArray_FROM_OTF(
		cost_matrix_obj, NPY_NOTYPE, NPY_ARRAY_IN_ARRAY));
		if (!cost_matrix_array) {
		PyErr_Clear();
		float32 = false;
		cost_matrix_array.reset(PyArray_FROM_OTF(
		cost_matrix_obj, NPY_FLOAT64, NPY_ARRAY_IN_ARRAY));
		if (!cost_matrix_array) {
		PyErr_SetString(PyExc_ValueError, "\"cost_matrix\" must be a numpy array of float32 or float64 dtype");
		return NULL;
		}
		PyErr_SetString(PyExc_ValueError, "\"cost_matrix\" must be a numpy array");
		return NULL;
		}
		int typ = PyArray_TYPE(cost_matrix_array.get());
		if (typ != NPY_FLOAT32 && typ != NPY_FLOAT64) {
		PyErr_SetString(PyExc_TypeError, "\"cost_matrix\" must be float32 or float64 for lapjvs_native()");
		return NULL;
		}

		@@ -137,33 +139,62 @@ auto ndims = PyArray_NDIM(cost_matrix_array.get());

		double lapcost;

		if (return_original) {
		// Allocate NumPy arrays for u, v only if they are returned.
		pyarray u_array(PyArray_SimpleNew(
		1, ret_dims, float32? NPY_FLOAT32 : NPY_FLOAT64));
		pyarray v_array(PyArray_SimpleNew(
		1, ret_dims, float32? NPY_FLOAT32 : NPY_FLOAT64));
		auto u = PyArray_DATA(u_array.get());
		auto v = PyArray_DATA(v_array.get());
		if (float32) {
		lapcost = call_lap<float>(dim, cost_matrix, verbose, row_ind, col_ind, u, v);
		} else {
		lapcost = call_lap<double>(dim, cost_matrix, verbose, row_ind, col_ind, u, v);
		}
		return Py_BuildValue("(OO(dOO))",
		row_ind_array.get(), col_ind_array.get(), lapcost,
		u_array.get(), v_array.get());
		if (typ == NPY_FLOAT32) {
		std::unique_ptr<float[]> v(new float[dim]);
		call_lap<float>(dim, cost_matrix, verbose, row_ind, col_ind, v.get());
		} else {
		// Temporary heap buffers for u, v to avoid NumPy allocation overhead.
		if (float32) {
		std::unique_ptr<float[]> u(new float[dim]);
		std::unique_ptr<float[]> v(new float[dim]);
		lapcost = call_lap<float>(dim, cost_matrix, verbose, row_ind, col_ind, u.get(), v.get());
		} else {
		std::unique_ptr<double[]> u(new double[dim]);
		std::unique_ptr<double[]> v(new double[dim]);
		lapcost = call_lap<double>(dim, cost_matrix, verbose, row_ind, col_ind, u.get(), v.get());
		}
		return Py_BuildValue("(dOO)", lapcost, row_ind_array.get(), col_ind_array.get());
		std::unique_ptr<double[]> v(new double[dim]);
		call_lap<double>(dim, cost_matrix, verbose, row_ind, col_ind, v.get());
		}

		return Py_BuildValue("(OO)", row_ind_array.get(), col_ind_array.get());
		}

		// Float32 entry point: lapjvs_float32(cost_matrix, verbose=False)
		// - Casts to float32 if needed, but avoids a copy when already float32.
		// - Returns (row_ind, col_ind).
		static PyObject py_lapjvs_float32(PyObject self, PyObject args, PyObject kwargs) {
		PyObject *cost_matrix_obj;
		int verbose = 0;
		static const char *kwlist[] = {"cost_matrix", "verbose", NULL};
		if (!PyArg_ParseTupleAndKeywords(
		args, kwargs, "O\|p", const_cast<char**>(kwlist),
		&cost_matrix_obj, &verbose)) {
		return NULL;
		}

		// Allow casting to float32, avoid copy if dtype already matches.
		pyarray cost_matrix_array(PyArray_FROM_OTF(
		cost_matrix_obj, NPY_FLOAT32, NPY_ARRAY_IN_ARRAY \| NPY_ARRAY_FORCECAST));
		if (!cost_matrix_array) {
		PyErr_SetString(PyExc_ValueError, "\"cost_matrix\" must be convertible to float32");
		return NULL;
		}

		auto ndims = PyArray_NDIM(cost_matrix_array.get());
		if (ndims != 2) {
		PyErr_SetString(PyExc_ValueError, "\"cost_matrix\" must be a square 2D numpy array");
		return NULL;
		}
		auto dims = PyArray_DIMS(cost_matrix_array.get());
		if (dims[0] != dims[1]) {
		PyErr_SetString(PyExc_ValueError, "\"cost_matrix\" must be a square 2D numpy array");
		return NULL;
		}
		int dim = static_cast<int>(dims[0]);
		if (dim <= 0) {
		PyErr_SetString(PyExc_ValueError, "\"cost_matrix\"'s shape is invalid or too large");
		return NULL;
		}

		auto cost_matrix = PyArray_DATA(cost_matrix_array.get());

		npy_intp ret_dims[] = {dim, 0};
		pyarray row_ind_array(PyArray_SimpleNew(1, ret_dims, NPY_INT));
		pyarray col_ind_array(PyArray_SimpleNew(1, ret_dims, NPY_INT));
		auto row_ind = reinterpret_cast<int*>(PyArray_DATA(row_ind_array.get()));
		auto col_ind = reinterpret_cast<int*>(PyArray_DATA(col_ind_array.get()));

		std::unique_ptr<float[]> v(new float[dim]);
		call_lap<float>(dim, cost_matrix, verbose, row_ind, col_ind, v.get());

		return Py_BuildValue("(OO)", row_ind_array.get(), col_ind_array.get());
		}

+31

-33

src/_lapjvs/lapjvs.h

		@@ -5,2 +5,3 @@ #include <cassert>
		#include <memory>
		#include <vector>

		@@ -59,13 +60,22 @@ #ifdef __GNUC__
		/// @param colsol out row assigned to column in solution / size dim
		/// @param u out dual variables, row reduction numbers / size dim
		/// @param v out dual variables, column reduction numbers / size dim
		/// @return achieved minimum assignment cost
		/// @param v inout dual variables, column reduction numbers / size dim
		template <bool verbose, typename idx, typename cost>
		cost lapjvs(int dim, const cost restrict assign_cost, idx restrict rowsol,
		idx restrict colsol, cost restrict u, cost *restrict v) {
		auto collist = std::make_unique<idx[]>(dim); // list of columns to be scanned in various ways.
		auto matches = std::make_unique<idx[]>(dim); // counts how many times a row could be assigned.
		auto d = std::make_unique<cost[]>(dim); // 'cost-distance' in augmenting path calculation.
		auto pred = std::make_unique<idx[]>(dim); // row-predecessor of column in augmenting/alternating path.
		void lapjvs(int dim, const cost restrict assign_cost, idx restrict rowsol,
		idx restrict colsol, cost restrict v) {
		// Reuse per-thread buffers to avoid per-call allocations
		static thread_local std::vector<idx> collist_vec;
		static thread_local std::vector<idx> matches_vec;
		static thread_local std::vector<idx> pred_vec;
		static thread_local std::vector<cost> d_vec;

		if ((int)collist_vec.size() < dim) collist_vec.resize(dim);
		if ((int)matches_vec.size() < dim) matches_vec.resize(dim);
		if ((int)pred_vec.size() < dim) pred_vec.resize(dim);
		if ((int)d_vec.size() < dim) d_vec.resize(dim);

		idx *restrict collist = collist_vec.data(); // list of columns to be scanned.
		idx *restrict matches = matches_vec.data(); // counts how many times a row could be assigned.
		cost *restrict d = d_vec.data(); // 'cost-distance' in augmenting path calculation.
		idx *restrict pred = pred_vec.data(); // row-predecessor of column in augmenting/alternating path.

		// init how many times a row will be assigned in the column reduction.
		@@ -91,3 +101,3 @@ for (idx i = 0; i < dim; i++) {
		if (++matches[imin] == 1) {
		// init assignment if minimum row assigned for first time.
		// init assignment if minimum row assigned for the first time.
		rowsol[imin] = j;
		@@ -104,3 +114,3 @@ colsol[j] = imin;
		// REDUCTION TRANSFER
		auto free = matches.get(); // list of unassigned rows.
		idx *restrict free_rows = matches; // list of unassigned rows (reuse matches' storage).
		idx numfree = 0;
		@@ -110,4 +120,4 @@ for (idx i = 0; i < dim; i++) {
		if (matches[i] == 0) { // fill list of unassigned 'free' rows.
		free[numfree++] = i;
		} else if (matches[i] == 1) { // transfer reduction from rows that are assigned once.
		free_rows[numfree++] = i;
		} else if (matches[i] == 1) { // transfer reduction from rows assigned once.
		idx j1 = rowsol[i];
		@@ -117,5 +127,4 @@ cost min = std::numeric_limits<cost>::max();
		if (j != j1) {
		if (local_cost[j] - v[j] < min) {
		min = local_cost[j] - v[j];
		}
		cost cand = local_cost[j] - v[j];
		if (cand < min) min = cand;
		}
		@@ -136,3 +145,3 @@ }
		while (k < prevnumfree) {
		idx i = free[k++];
		idx i = free_rows[k++];

		@@ -159,5 +168,5 @@ // find minimum and second minimum reduced cost over columns.
		if (vj1_lowers) {
		free[--k] = i0;
		free_rows[--k] = i0;
		} else {
		free[numfree++] = i0;
		free_rows[numfree++] = i0;
		}
		@@ -174,3 +183,3 @@ }
		idx endofpath;
		idx freerow = free[f]; // start row of augmenting path.
		idx freerow = free_rows[f]; // start row of augmenting path.
		if (verbose) {
		@@ -265,15 +274,4 @@ printf("lapjvs: AUGMENT SOLUTION row %d [%d / %d]\n",

		// calculate optimal cost.
		cost lapcost = 0;
		for (idx i = 0; i < dim; i++) {
		const cost local_cost = &assign_cost[i dim];
		idx j = rowsol[i];
		u[i] = local_cost[j] - v[j];
		lapcost += local_cost[j];
		}
		if (verbose) {
		printf("lapjvs: optimal cost calculated\n");
		}

		return lapcost;
		// Final cost and row duals (u) are not computed here anymore, since the Python
		// wrapper recomputes the total cost from the original input for numeric parity.
		}

NOTICE

Sorry, the diff of this file is not supported yet

src/_lapjv/_lapjv.cpp

Sorry, the diff of this file is too big to display

src/_lapjv/_lapjvx.cpp

Sorry, the diff of this file is too big to display

		@@ -21,7 +21,7 @@ <details><summary>🆕 What's new</summary><br>

		`lapx` features the original `lapjv()` and `lapmod()` functions, and since v0.6.0, `lapx` has introduced three additional assignment solvers:
		`lapx` features the original `lapjv()` and `lapmod()` functions, and since [v0.6.0](https://github.com/rathaROG/lapx/releases/tag/v0.6.0), `lapx` has introduced three additional assignment solvers:
		- `lapjvx()` and `lapjvxa()` — enhanced versions of [`lap.lapjv()`](https://github.com/gatagat/lap) with more flexible output formats
		- `lapjvc()` — an enhanced version of Christoph Heindl’s [`lapsolver.solve_dense()`](https://github.com/cheind/py-lapsolver) with unified output formats

		`lapx` v0.7.0 has introduced a new function: `lapjvs()` — an enhanced version of Vadim Markovtsev’s [`lapjv`](https://github.com/src-d/lapjv), supporting both rectangular and square cost matrices, with flexible output styles.
		`lapx` [v0.7.0](https://github.com/rathaROG/lapx/releases/tag/v0.7.0) has introduced a new function: `lapjvs()` — an enhanced version of Vadim Markovtsev’s [`lapjv()`](https://github.com/src-d/lapjv), supporting both rectangular and square cost matrices, with flexible output styles.

		@@ -28,0 +28,0 @@ <details><summary>Read more</summary><br>

lapx - npm Package Compare versions

Improved metrics

Worsened metrics