1

I'm trying to work with 2d arrays that can be accessed by column names using python. The data come from a database and it may have different types and null values. NoneType is not allowed in the tuples so I tried to replace them by np.nan.

This piece of code works if there are no null values in the database. However, my final goal is to have a masked array, but I cannot even create an array.

import MySQLdb
import numpy
connection = MySQLdb.connect(host=server, user=user, passwd=password, db=db)
cursor = connection.cursor()
cursor.execute(query)
results = list(cursor.fetchall())
dt = [('cig', int), ('u_CIG', 'S10'), ('e_ICO', float), ('VCO', int)]
for index_r, row in enumerate(results):
 newrow = list(row)
 for index_c, col in enumerate(newrow):
 if col is None:
 newrow[index_c] = numpy.nan
 results[index_r] = tuple(newrow)
 x = numpy.array(results, dtype=dt)

The resulting error is:

x = numpy.array(results, dtype=dtypes)
ValueError: cannot convert float NaN to integer

After performing fetchall, results contain something like:

[(10L,
'*',
Decimal('3.47'),
180L),
(27L,
' ',
Decimal('7.21'),
None)]

Any idea of how can I solve this problem? Thank you!

Erik Kaplun
38.5k15 gold badges102 silver badges113 bronze badges
asked Oct 3, 2013 at 11:38

2 Answers 2

2

There is no integer representation of NaN. You can either switch to floating point, or construct the mask while filling the array:

>>> values = [1, 2, None, 4]
>>> arr = np.empty(len(values), dtype=np.int64)
>>> mask = np.zeros(len(values), dtype=np.bool)
>>> for i, v in enumerate(values):
... if v is None:
... mask[i] = True
... else:
... arr[i] = v
... 
>>> np.ma.array(arr, mask=mask)
masked_array(data = [1 2 -- 4],
 mask = [False False True False],
 fill_value = 999999)
answered Oct 3, 2013 at 11:45
Sign up to request clarification or add additional context in comments.

2 Comments

But the problem comes using tuples. If you define a list of tuples it throws exceptions: e.g. values = [(1, 2), (None, 4)] arr = np.empty(len(values), dtype=[('c1', np.int64),('c2', np.int64)])
@tetrarquis: then use np.empty((len(values), len(values[0])). What I posted is just an example, you'll have to adapt it to your use case.
0

Working on Larsmans example, I think what you want would be:

 import numpy as np
 import numpy.ma as ma
 values = [('<', 2, 3.5, 'as', 6), (None, None, 6.888893, 'bb', 9),
 ('a', 66, 77, 'sdfasdf', 45)]
 nrows = len(values)
 arr = ma.zeros(nrows, dtype=[('c1', 'S1'),('c2', np.int), ('c3', np.float), 
 ('c4', 'S8'), ('c5', np.int)])
 for i, row in enumerate(values):
 for j, cell in enumerate(values[i]):
 if values[i][j] is None:
 arr.mask[i][j] = True
 else:
 arr.data[i][j] = cell
 print arr
answered Oct 4, 2013 at 8:56

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.