3

I'm unit testing code in Python2.7 that writes numpy array via ndarray.tofile(fileHandle,..). Since doing file IO in unit tests is bad for a number of reasons, how do I substitute a byte memorystream in place of the file handle? (io.BytesIO failed to work because ndarray.toFile() asks it for a file name.)

ali_m
74.7k28 gold badges231 silver badges315 bronze badges
asked Jul 14, 2015 at 17:20

2 Answers 2

3

Shouldn't tobytes [1] and frombuffer [2] do what you need for testing purposes?

m = np.random.rand(5,3)
b = m.tobytes()
mb = np.frombuffer(b).reshape(m.shape)
answered Jul 15, 2015 at 3:04
Sign up to request clarification or add additional context in comments.

1 Comment

That will work assuming that tofile() doesn't deviate from tobytes. This is the best answer for the current state of numpy. It's unfortunate that tofile doesn't accept a stream so it would be possible to directly test unit test tofile() API.
0

Would a tempfile.TemporaryFile suit your purposes?

It exposes the same interface as a normal file object, so you can pass it directly to np.ndarray.tofile(), and it will be deleted immediately when it is either explicitly closed or garbage collected:

import numpy as np
from tempfile import TemporaryFile
x = np.random.randn(1000)
with TemporaryFile() as t:
 x.tofile(t)
 # do your testing...
# t is closed and deleted

It will, however, reside temporarily on disk (usually in /tmp/ on a Linux machine), but I don't see an easy way to avoid I/O altogether, since .tofile() will ultimately need a valid OS-level file descriptor.

answered Jul 14, 2015 at 19:58

6 Comments

Building automated unit tests with file io leaves race conditions that cause tests to not behave deterministically. If one adds sleeps to ensure asynchronous file io is finished, then you have slow unit tests and slow unit tests aren't scalable to having hundreds of unit tests which run in a few seconds. What you suggest is perfectly fine for a few system tests but that's not what I'm doing.
It would be helpful if you could provide a bit more information about your requirements. How much data are you writing? Do you need to be able to read it back? What sort of race conditions are you concerned about? Do you absolutely need to use tofile?
I want to test an application that uses numbpy. The application is creating files. I need to write out a bytes to kb to confirm if the correct bytes are being produced. To write automated tests that will work consistently across hardware, I want to check the in memory output stream before it is written to a file. This last part I'm finding difficult because although the api docs for nddarray.toFile() say it takes a filehandler argument or a handle to a stream, it doesn't handle the bytesIo handle I'm passing in. I fear that as of now, it requires a file handler. :-)
I'm looking at nddarray.toBytes() and doing some tests to see if I can assume toBytes() will mirror what is written out with toFile(). If that works, I'll figure something out. Maybe use it for mocking.
FWIW, here's the relevant method in the C source, which in turn calls npy_PyFile_Dup2 (note the use of os.dup to duplicate a file descriptor). This is all going on at C level, so I don't see an easy way to fake an open file via a Python object.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.