I'm developing an application that will hide information inside the quantization tables of JPEG files. It's called DQTsteg, if you want to know more about what I'm attempting to do.
Right now I'm developing the most basic component of this system, which is the part that reads quantization tables from files. So it'd be a good thing if this code were as fast as it can possibly be, as it'll be a very frequent operation.
Here's the whole code on GitHub, fewer than 93 lines long.
I have noticed that sometimes urllib.requests sometimes are somewhat slow (above one second).
I've seen some people on Stack Overflow saying that some servers don't support byte by byte requests through urllib.request
(you either download the whole file or you don't). I don't know whether these people are right or not, but is there any case where this urllib.request
could go wrong (and then I'd either be denied access to the target file or I'd loop endlessly over the first few bytes, or anything of this sort)?
Any suggestions are welcome. Here follows the code:
#!/usr/bin/env python
"""
QT_reader.py
Functions for reading quantization tables from files and URLs.
Below are some global variables defined according to the JPEG Standard.
"""
DEFINE_QUANTIZATION_TABLE = b'\xff\xdb'
SINGLE_TABLE_PAYLOAD_DATA = b'\x00\x43'
DOUBLE_TABLE_PAYLOAD_DATA = b'\x00\x84'
def load_file(jpgpath):
"""
Loads a file from jpgpath. Direct acess not intended; use QT_get_single() or QT_get_all().
Args:
jpgpath (str): file path.
Returns:
jpgpath (object): file object.
"""
if load_file.status == "web":
import urllib.request
jpgpath = urllib.request.urlopen(jpgpath)
if load_file.status == "file":
jpgpath = open(jpgpath,"rb")
return jpgpath
def QT_get_single(jpgpath, status="file"):
"""
Reads a single quantization table from file.
Args:
jpgpath (str): file path, local or URL.
status (str): set to "file" for local files or set to "web" for URLs.
Returns:
tuple: A tuple with .tell() position for the DQT on [0] and the QT table itself on [1] as a bytearray.
"""
if QT_get_single.status != "off":
if status == "web":
load_file.status = "web"
elif status == "file":
load_file.status = "file"
load_file.jpgpath = jpgpath
jpgpath = load_file(load_file.jpgpath)
while True:
while True:
QT_buffer = jpgpath.read(2)
if QT_buffer == DEFINE_QUANTIZATION_TABLE: break
if QT_buffer == b'':
raise EOFError("Reached end of file.")
DQT_position = jpgpath.tell()
QT_buffer = jpgpath.read(2)
if QT_buffer == DOUBLE_TABLE_PAYLOAD_DATA:
QT_buffer = jpgpath.read(130)
break
elif QT_buffer == SINGLE_TABLE_PAYLOAD_DATA:
QT_buffer = jpgpath.read(65)
break
else:
continue
return ( DQT_position, bytearray(QT_buffer) )
def QT_get_all(jpgpath, status="file",stop=-1):
"""
Reads multiple quantization tables from file. Reads all QTs by default.
Args:
jpgpath (str): file path, local or URL.
status (str): set to "file" for local files or set to "web" for URLs.
stop (int): Stop after retrieving the nth table. One retrieves one; two retrieves two; zero or default retrieves all.
Returns:
list: A list of QT_get_single() tuples.
"""
QT_tables = []
counter = 0
load_file.status = status
load_file.jpgpath = jpgpath
jpgpath = load_file(load_file.jpgpath)
QT_get_single.jpgpath = jpgpath
QT_get_single.status = "off"
while True:
try:
QT_buffer = QT_get_single(QT_get_single.jpgpath,QT_get_single.status)
QT_tables.append(QT_buffer)
counter += 1
except: break
if counter == stop: break
return QT_tables
1 Answer 1
Honestly, I'm not very happy with this bit of documentation:
Direct [access] not intended;
The community feels that the appropriate way to
convey such a "private" intention is with an _
underscore, so:
def load_file(jpgpath):
becomes
def _load_file(jpgpath):
You mention load_file.status
.
Based on the posted question, I have no idea what load_file
is all about.
In particular, I do not see a relevant import
statement.
Is there some unfamiliar idiom you'd like to clue the reviewer in to?
Where we tack on attributes to a function object?
Maybe we do that in some languages, but in python it
certainly is not a common idiom.
I am reading this signature:
def QT_get_single(jpgpath, status="file"):
Please stick to conventional identifiers.
PEP-8
asks that you name it qt_get_single
.
I am reading this:
while True:
while True:
QT_buffer = jpgpath.read(2)
...
and I have a lot of trouble with it.
Maybe that is the true spec for that file type? But I have a hard time believing that. At a minimum, please turn the infinite loop into a bounded loop.
The 65
and 130
are
magic numbers.
Please handle them more gracefully,
perhaps with MANIFEST_CONSTANTS.
I am reading QT_get_all
.
Ok, now I see where those fn.attr settings came from.
I feel that is a crazy convention to adopt.
Perhaps you will choose to turn functions
into class
methods and convert them to self.attr
variables.
from urrlib import request
rather thanimport urrlib.request
because it will probably make the other bit look nicer as you just have to callrequest...
rather thanurrlib.request...
\$\endgroup\$