3

I am using numpy and pandas to attempt to concatenate a number of heterogenous values into a single array.

np.concatenate((tmp, id, freqs))

Here are the exact values:

tmp = np.array([u'DNMT3A', u'p.M880V', u'chr2', 25457249], dtype=object)
freqs = np.array([0.022831050228310501], dtype=object)
id = "id_23728"

The dimensions of tmp, 17232, and freqs are as follows:

[in] tmp.shape
[out] (4,)
[in] np.array(17232).shape
[out] ()
[in] freqs.shape
[out] (1,)

I have also tried casting them all as numpy arrays to no avail.

Although the variable freqs will frequently have more than one value.

However, with both the np.concatenate and np.append functions I get the following error:

*** ValueError: all the input arrays must have same number of dimensions

These all have the same number of columns (0), why can't I concatenate them with either of the above described numpy methods?

All I'm looking to obtain is[(tmp), 17232, (freqs)] in one single dimensional array, which is to be appended onto the end of a pandas dataframe.

Thanks.

Update

It appears I can concatenate the two existing arrays:

np.concatenate([tmp, freqs],axis=0)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 0.022831050228310501], dtype=object)

However, the integer, even when casted cannot be used in concatenate.

np.concatenate([tmp, np.array(17571)],axis=0)
*** ValueError: all the input arrays must have same number of dimensions

What does work, however is nesting append and concatenate

np.concatenate((np.append(tmp, 17571), freqs),)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 17571,
 0.022831050228310501], dtype=object)

Although this is kind of messy. Does anyone have a better solution for concatenating a number of heterogeneous arrays?

askewchan
46.7k18 gold badges125 silver badges135 bronze badges
asked Mar 24, 2014 at 19:52
2
  • 2
    Not great, but at least it doesn't care which things are 0d: np.concatenate(map(np.atleast_1d, [tmp, id, freqs])) Commented Mar 24, 2014 at 20:27
  • @askewchan that is nice, the atleast_1d function hadn't crossed my radar. Thanks Commented Mar 24, 2014 at 22:47

1 Answer 1

2

The problem is that id, and later the integer np.array(17571), are not an array_like object. See here how numpy decides whether an object can be converted automatically to a numpy array or not.

The solution is to make id array_like, i.e. to be an element of a list or tuple, so that numpy understands that id belongs to a 1D array_like structure

It all boils down to

concatenate((tmp, (id,), freqs))

or

concatenate((tmp, [id], freqs))

To avoid this sort of problems when dealing with input variables in functions using numpy, you can use atleast_1d, as pointed out by @askewchan. See about it this question/answer.

Basically, if you are unsure if in different scenarios your variable id will be a single str or a list of str, you are better off using

concatenate((tmp, atleast_1d(id), freqs))

because the two options above will fail if id is already a list/tuple of strings.

EDIT: It may not be obvious why np.array(17571) is not an array_like object. This happens because np.array(17571).shape==(), so it is not iterable as it has no dimensions.

answered Mar 24, 2014 at 21:33
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.