Compress a sparse matrix using Compressed sparse row (CSR, CRS or Yale format).
These are all the same form of compression (ignore new Yale).
Input may be any 2d data structure (list of lists, etc): e.g
[[0 0 0 0],
[5 8 0 0],
[0 0 3 0],
[0 6 0 0]]
And the output should be three 1d data structures (list etc), that denote the outputs A
, IA
and JA
, for example
[5, 8, 3, 6]
[0, 0, 2, 3, 4]
[0, 1, 2, 1,]
The process is described by wikipedia:
The array
A
is of lengthNNZ
and holds all the nonzero entries ofM
in left-to-right top-to-bottom ("row-major") order.The array
IA
is of lengthm + 1
. It is defined by this recursive definition:
IA[0] = 0
IA[i] = IA[i − 1] + (number of nonzero elements on the (i − 1)-th row in the original matrix)
Thus, the first
m
elements ofIA
store the index intoA
of the first nonzero element in each row ofM
, and the last elementIA[m]
storesNNZ
, the number of elements inA
, which can be also thought of as the index inA
of first element of a phantom row just beyond the end of the matrixM
. The values of thei
-th row of the original matrix is read from the elementsA[IA[i]]
toA[IA[i + 1] − 1]
(inclusive on both ends), i.e. from the start of one row to the last index just before the start of the next.[5]The third array,
JA
, contains the column index inM
of each element ofA
and hence is of lengthNNZ
as well.
If your language doesn't support actual data structures, input and output may be text.
Test cases
Input 1:
[[0 0 0 0],
[5 8 0 0],
[0 0 3 0],
[0 6 0 0]]
Output 1:
[ 5, 8, 3, 6 ]
[ 0, 0, 2, 3, 4 ]
[ 0, 1, 2, 1, ]
Input 2
[[10 20 0 0 0 0],
[0 30 0 40 0 0],
[0 0 50 60 70 0],
[0 0 0 0 0 80]]
Output 2:
[ 10 20 30 40 50 60 70 80 ]
[ 0 2 4 7 8 ]
[ 0 1 1 3 2 3 4 5 ]
Input 3:
[[0 0 0],
[0 0 0],
[0 0 0]]
Output 3:
[ ]
[ 0 0 0 0 ]
[ ]
Input 4:
[[1 1 1],
[1 1 1],
[1 1 1]]
Output 4:
[ 1 1 1 1 1 1 1 1 1 ]
[ 0 3 6 9 ]
[ 0 1 2 0 1 2 0 1 2 ]
Input 5:
[[0 0 0 0],
[5 -9 0 0],
[0 0 0.3 0],
[0 -400 0 0]]
Output 5:
[ 5, -9, 0.3, -400 ]
[ 0, 0, 2, 3, 4 ]
[ 0, 1, 2, 1, ]
Assume inputs may contain any real number, you need not consider mathematical symbols or exponential representation (e.g. 5,000 will never be entered as 5e3). You will not need to handle inf
, -inf
, NaN
or any other 'pseudo-numbers'. You may output a different representation of the number (5,000 may be output as 5e3 if you so choose).
Scoring
This is a code-golf, fewest bytes wins.
15 Answers 15
MATL, 19 bytes
!3#f!Dx0Gg!XsYshDq!
Input uses ;
as row separator.
Try it online! Or verify all test cases: 1, 2, 3, 4, 5.
Explanation
! % Implicit input. Transpose
3#f % 3-output version of find: it takes all nonzero values and pushes
% their column indices, row indices, and values, as column vectors
! % Transpose into a row vector
D % Display (and pop) vector of values
x % Delete vector of row values
0 % Push 0
G % Push input
g % Convert to logical: nonzeros become 1
! % Transpose
Xs % Sum of columns. Gives a row vector
Ys % Cumulative sum
h % Prepend the 0 that's below on the stack
D % Display (and pop) that vector
q % Subtract 1 from the vector of row indices
! % Transpose into a row vector. Implicitly display
Mathematica, 78 bytes
{a=SparseArray@#;a@"NonzeroValues",a@"RowPointers",Join@@a@"ColumnIndices"-1}&
APL (Dyalog), (削除) 31 (削除ここまで) 28 chars or (削除) 36 (削除ここまで) 33 bytes*
Requires ⎕IO←0
for zero based indexing. I/O is list of lists.
{(∊d)(0,+\≢ ̈d←⍵~ ̈0)(∊⍸ ̈⍵≠0)}
{
...}
anonymous function where the argument is represented by ⍵
(
...)(
...)(
...)
return a list of three things:
⍵≠0
Boolean where the argument differs from 0
⍸ ̈
ɩndices of those for each sub-list
∊
εnlist (flatten) to combine into single list
⍵~ ̈0
remove zeros from each sub-list of the argument
d←
store that as d
≢ ̈
tally each
+\
cumulative sum
0,
prepend a zero
∊d
εnlist (flatten) d to combine into single list
* To run in Dyalog Classic, simply replace ⍸
with ⎕U2378
.
-
\$\begingroup\$ Nice, I don't understand the input format though?
f 4 4⍴
and then the values? \$\endgroup\$AncientSwordRage– AncientSwordRage2017年07月05日 12:08:05 +00:00Commented Jul 5, 2017 at 12:08 -
\$\begingroup\$ @Pureferret the Code defines the function
f
. The Input is really a REPL, which callsf
on the result of4 4⍴...
which reshapes the data into a 4×4 matrix. \$\endgroup\$Adám– Adám2017年07月05日 12:09:43 +00:00Commented Jul 5, 2017 at 12:09 -
1\$\begingroup\$ Rho for reshapes. I get it! \$\endgroup\$AncientSwordRage– AncientSwordRage2017年07月05日 12:12:24 +00:00Commented Jul 5, 2017 at 12:12
-
1\$\begingroup\$ @Pureferret I've updated the Try it online! link to better show test cases. \$\endgroup\$Adám– Adám2017年07月05日 12:21:23 +00:00Commented Jul 5, 2017 at 12:21
Haskell, 87 bytes
f s|a<-filter(/=0)<$>s=(id=<<a,scanl(+)0$length<$>a,s>>= \t->[i|(i,e)<-zip[0..]t,e/=0])
How it works:
a<-filter(/=0)<$>s -- let a be the list of lists with all 0 removed]
-- e.g. [[1,0,0],[0,3,4]] -> [[1],[3,4]]
-- return a triple of
id=<<a -- a concatenated into a single list -> A
scanl(+)0$length<$>a -- partial sums of the length of the sublists of a
-- strating with an additional 0 -> IA
s>>= -- map the lambda over the sublists of s and concatenate
-- into a single list
\t->[i|(i,e)<-zip[0..]t,e/=0] -- the indices of the non-zero elements -> JA
Python+SciPy, 79 bytes
i guess built-ins were not forbidden
from scipy.sparse import*
A=csr_matrix(input())
print A.data,A.indptr,A.indices
Accepts input in the format [[0, 0, 0, 0],[5, 8, 0, 0],[0, 0, 3, 0],[0, 6, 0, 0]]
R, 70 bytes
function(m,M=t(m))list(M[x<-!!M],diffinv(colSums(x)),which(x,T)[,1]-1)
This is an old, matrix challenge that didn't get an R answer for 3 years! Having to do this in row-major rather than column-major order costs 8 bytes (,M=t(m))
).
Python 3, 96 bytes
lambda m:list(map(list,zip(*[[e,i,j] for i,r in enumerate(m) for j,e in enumerate(r) if e!=0])))
PHP, 107 bytes
<?for($y=[$c=0];$r=$_GET[+$l++];)foreach($r as$k=>$v)!$v?:[$x[]=$v,$z[]=$k,$y[$l]=++$c];var_dump($x,$y,$z);
PHP, 109 bytes
<?$y=[$c=0];foreach($_GET as$r){foreach($r as$k=>$v)if($v){$x[]=$v;$z[]=$k;$c++;}$y[]=$c;}var_dump($x,$y,$z);
-
\$\begingroup\$ Does this need the numbers to be strings? \$\endgroup\$AncientSwordRage– AncientSwordRage2017年07月05日 14:06:00 +00:00Commented Jul 5, 2017 at 14:06
-
1\$\begingroup\$ @Pureferret Any Input in PHP is a string or an array of strings. I have not casted the input so if you wish that the output is purely int replace
$x[]=$v
with$x[]=+$v
\$\endgroup\$Jörg Hülsermann– Jörg Hülsermann2017年07月05日 14:21:41 +00:00Commented Jul 5, 2017 at 14:21
JavaScript (ES6), 117 bytes
a=>[a.map((b,i)=>(b=b.filter((x,c)=>x&&o.push(c)),m[i+1]=m[i]+b.length,b),m=[0],o=[]).reduce((x,y)=>x.concat(y)),m,o]
Input is a 2D array of numbers and output is an array of [A, IA, JA]
.
Explained
a=>[
a.map((b,i) => ( // map each matrix row
b = b.filter((x,c) => x // filter to only non-zero elements
&& o.push(c) // and add this index to JA
)
m[i+1] = m[i] + b.length, // set next value of IA
b // and return filtered row
),
m=[0],o=[] // initialize IA (m) and JA (o)
).reduce((x,y) => x.concat(y)), // flatten the non-zero matrix
m,o] // append IA and JA
Tests
let f=
a=>[a.map((b,i)=>(b=b.filter((x,c)=>x&&o.push(c)),m[i+1]=m[i]+b.length,b),m=[0],o=[]).reduce((x,y)=>x.concat(y)),m,o]
let run=x=>O.innerHTML+=f(x).map(s=>`[${s.join`, `}]`).join`\n`+"\n\n"
run([[0,0,0,0],[5,8,0,0],[0,0,3,0],[0,6,0,0]])
run([[10,20,0,0,0,0],[0,30,0,40,0,0],[0,0,50,60,70,0],[0,0,0,0,0,80]])
run([[0,0,0],[0,0,0],[0,0,0]])
run([[1,1,1],[1,1,1],[1,1,1]])
run([[0,0,0,0],[5,-9,0,0],[0,0,0.3,0],[0,-400,0,0]])
<pre id=O></pre>
Japt, (削除) 31 (削除ここまで) (削除) 27 (削除ここまで) 17 bytes
Hopefully this output format is OK.
cf pUmè iT å+ Ucð
cf pUmè iT å+ Ucð :Implicit input of 2D array U
c :Flat map
f : Filter
p :Push the following two elements
Um : First element: Map U
è : Count the truthy elements in each
i : Prepend
T : Zero
å+ : Cumulatively reduce by addition
Uc : Second element: Flat map U
ð : 0-based indices of truthy elements
-
\$\begingroup\$ I just ran the other examples though and it works \$\endgroup\$AncientSwordRage– AncientSwordRage2017年07月05日 12:08:27 +00:00Commented Jul 5, 2017 at 12:08
Python 2, 115 bytes
lambda m:zip(*[[v,i]for k in m for i,v in enumerate(k)if v])+[reduce(lambda a,b:a+[len(b)-b.count(0)+a[-1]],m,[0])]
Output is [A, JA, IA]
Perl 6, 84 bytes
{.flatmap(*.grep(+*)),(0,|[\+] .map(+*.grep(+*))),.flat.kv.flatmap:{$^a%.[0]xx?$^b}}
The single matrix argument is in $_
.
.flatmap(*.grep(+*))
selects the nonzero elements of the entire matrix.[\+] .map(+*.grep(+*))
is the triangular reduction of the number of elements in each row (which some languages callscan
).(0,|...)
prepends a zero to that list..flat.kv
produces an indexed list of all elements of the matrix..flatmap: { $^a % .[0] xx ?$^b }
flat-maps over the modulus of each index by the number of columns in the array (.[0]
, the number of elements in the first row), replicated by the element itself, interpreted as a boolean. That is, nonzero elements are replicated once, and zero elements are replicated zero times (ie, removed).
Julia, (削除) 66 (削除ここまで) 63 bytes
using SparseArrays
n*a=getfield(sparse(a'),n)
!a=5a,3a.-1,4a.-1
SparseArrays
is a standard library that stores with CSC instead of CSR (column instead of row or something), that's why we need to transpose (a'
).
-6 bytes if 1-indexing was allowed
overloads *
with getfield
to save a few bytes
Scala, 82 bytes
x=>x.flatMap(_.zipWithIndex).filter(_._1!=0).unzip->x.scanLeft(0)(_+_.count(_!=0))
Returns ((A, JA), IA)
Explanation:
x => //The sparse matrix to be smooshed
x.flatMap( //Do the following operation to each row, then flatten the result
_.zipWithIndex //Zip each element with its column (makes tuples)
).filter( //Keep the tuples where
_._1!=0 //the first element isn't 0
).unzip //Unzip into a tuple of two lists, A and JA
-> //Make that the first element of another 2-tuple with IA:
x.scanLeft(0)( //Starting with 0
_+_.count(_!=0) //Add the number of nonzero elements in each row
) //Keep the intermediate results, giving us IA
IA[0] = 0
completely unnecessary? It's only needed to defineIA[i] = IA[i − 1]...
, yet we could simply state that ifi-1 < 0
to use 0. That is, IA[0] is always equal to 0, therefor it can be compressed out (yes, I realize that this is a critique of the algorithm, not this challenge). \$\endgroup\$