How to create a numpy 2d array from database with null values

Question 1

I'm trying to work with 2d arrays that can be accessed by column names using python. The data come from a database and it may have different types and null values. NoneType is not allowed in the tuples so I tried to replace them by np.nan.

This piece of code works if there are no null values in the database. However, my final goal is to have a masked array, but I cannot even create an array.

import MySQLdb
import numpy
connection = MySQLdb.connect(host=server, user=user, passwd=password, db=db)
cursor = connection.cursor()
cursor.execute(query)
results = list(cursor.fetchall())
dt = [('cig', int), ('u_CIG', 'S10'), ('e_ICO', float), ('VCO', int)]
for index_r, row in enumerate(results):
 newrow = list(row)
 for index_c, col in enumerate(newrow):
 if col is None:
 newrow[index_c] = numpy.nan
 results[index_r] = tuple(newrow)
 x = numpy.array(results, dtype=dt)

The resulting error is:

x = numpy.array(results, dtype=dtypes)
ValueError: cannot convert float NaN to integer

After performing fetchall, results contain something like:

[(10L,
'*',
Decimal('3.47'),
180L),
(27L,
' ',
Decimal('7.21'),
None)]

Any idea of how can I solve this problem? Thank you!

Question 2

There is no integer representation of NaN. You can either switch to floating point, or construct the mask while filling the array:

>>> values = [1, 2, None, 4]
>>> arr = np.empty(len(values), dtype=np.int64)
>>> mask = np.zeros(len(values), dtype=np.bool)
>>> for i, v in enumerate(values):
... if v is None:
... mask[i] = True
... else:
... arr[i] = v
... 
>>> np.ma.array(arr, mask=mask)
masked_array(data = [1 2 -- 4],
 mask = [False False True False],
 fill_value = 999999)

Question 3

But the problem comes using tuples. If you define a list of tuples it throws exceptions: e.g. values = [(1, 2), (None, 4)] arr = np.empty(len(values), dtype=[('c1', np.int64),('c2', np.int64)])

Question 4

@tetrarquis: then use np.empty((len(values), len(values[0])). What I posted is just an example, you'll have to adapt it to your use case.

Question 5

Working on Larsmans example, I think what you want would be:

 import numpy as np
 import numpy.ma as ma
 values = [('<', 2, 3.5, 'as', 6), (None, None, 6.888893, 'bb', 9),
 ('a', 66, 77, 'sdfasdf', 45)]
 nrows = len(values)
 arr = ma.zeros(nrows, dtype=[('c1', 'S1'),('c2', np.int), ('c3', np.float), 
 ('c4', 'S8'), ('c5', np.int)])
 for i, row in enumerate(values):
 for j, cell in enumerate(values[i]):
 if values[i][j] is None:
 arr.mask[i][j] = True
 else:
 arr.data[i][j] = cell
 print arr

Fred Foo 365k80 gold badges765 silver badges852 bronze badges · Accepted Answer · 2013-10-03 11:45:37Z

There is no integer representation of NaN. You can either switch to floating point, or construct the mask while filling the array:

>>> values = [1, 2, None, 4]
>>> arr = np.empty(len(values), dtype=np.int64)
>>> mask = np.zeros(len(values), dtype=np.bool)
>>> for i, v in enumerate(values):
... if v is None:
... mask[i] = True
... else:
... arr[i] = v
... 
>>> np.ma.array(arr, mask=mask)
masked_array(data = [1 2 -- 4],
 mask = [False False True False],
 fill_value = 999999)

But the problem comes using tuples. If you define a list of tuples it throws exceptions: e.g. values = [(1, 2), (None, 4)] arr = np.empty(len(values), dtype=[('c1', np.int64),('c2', np.int64)])
@tetrarquis: then use np.empty((len(values), len(values[0])). What I posted is just an example, you'll have to adapt it to your use case.

CollectivesTM on Stack Overflow

How to create a numpy 2d array from database with null values

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related