I've been strugling to create a sub-array from specific elements of a first array.
Given a first array that looks like this (it commes from a txt file with two lines :
L1,(B:A:3:1),(A:C:5:2),(C:D:2:3)
L2,(C:E:2:0.5),(E:F:10:1),(F:D:0.5:0.5)):
code
toto = pd.read_csv("bd_2_test.txt",delimiter=",",header=None,names=["Line","1rst","2nd","3rd"])
matrix_toto = toto.values
matrix_toto
result
Line 1rst 2nd 3rd
0 L1 (B:A:3:1) (A:C:5:2) (C:D:2:3)
1 L2 (C:E:2:0.5) (E:F:10:1) (F:D:0.5:0.5)
how can I transform it into an array like this one?
array([['B', 'A', 3, 1],
['A', 'C', 5, 2],
['C', 'D', 2, 3],
['C', 'E', 2, 0.5],
['E', 'F', 10, 1],
['F', 'D', 0.5, 0.5]], dtype=object)
I tried vectorizing but I get each second element of the array.
np.vectorize(lambda s: s[1])(matrice_toto)
array([['1', 'B', 'A', 'C'],
['2', 'C', 'E', 'F']], dtype='<U1')
1 Answer 1
I am not sure what you are trying is the optimal solution to your real problem. But, well, staying as close as possible to your initial try
# We need regular expression to transform a string of ``"(x:y:z:t)"`` into an array``["x","y","z","t"]``
import re
# tr does that transformation
tr=lambda s: np.array(re.findall('\(([^:]*):([^:]*):([^:]*):([^:]*)\)', s)[0])
# Alternative version, without re (and maybe best, I've benchmarked it)
tr=lambda s: s[1:-1].split(':') # s[1:-1] remove 1st and last char, so parenthesis. And .split(':') creates an array for substring separated by colons.
# trv is the vectorization of tr
# We need the signature, because the return type is an array itself.
trv=np.vectorize(tr, signature='()->(n)')
result=trv(matrix_toto[:,1:].flatten())
Note that matrix_toto[:,1:]
is your matrix, without the 1st column (the line name). And matrix_toto[:,1:].flatten()
flatten it, so we have 1 entry per cell of your initial array (excluding line name). Each of those cell is a string "(x:y:z:t)"
. Which is transformed by trv into an array.
Result is
array([['B', 'A', '3', '1'],
['A', 'C', '5', '2'],
['C', 'D', '2', '3'],
['C', 'E', '2', '0'],
['E', 'F', '1', '1'],
['F', 'D', '0', '0']], dtype='<U1')
Obviously you need only one of the 2 lines tr=...
. I let both in the code, because I don't know the exact specification of those (x:y:z:t)
patterns, so you may need to adapt from one of the 2 versions.
(B:A:3:1)
in the first array. Are they tuples? Strings?values
array is a (2,4) array of strings. Yourlambda
just takes the 2nd character from each string. So for one thing you need to skip the "L1" element. You also need to split the other strings into 4 characters - drop the () and split on :. I don't know ifpandas
has a 'expand' that will help. But at the numpy level you have a lot string manipulation to do first.