Problem Statement:
- Extract the filename and read if filename has jQuery on it, put them aside.
- Second step will be to check one more and see if any jQuery files has
ui
in the filename.
Input:
script_ref1 = """<script type="text/javascript" src="js/jquery-1.9.1.min.js"/>
<script type="text/javascript" src="js/jquery-migrate-1.2.1.min.js"/>
<script type="text/javascript" src="js/jquery-ui.min.js"/>
<script type="text/javascript" src="js/abc_bsub.js"/>
<script type="text/javascript" src="js/abc_core.js"/>
<script type="text/javascript" src="js/abc_explore.js"/>
<script type="text/javascript" src="js/abc_qaa.js"/>"""
OR
script_ref2 = """<script type="text/javascript" src="js/jquery-1.9.1.min.js"/>
<script type="text/javascript" src="js/jquery-migrate-1.2.1.min.js"/>
<script type="text/javascript" src="js/abc_bsub.js"/>
<script type="text/javascript" src="js/abc_core.js"/>
<script type="text/javascript" src="js/abc_explore.js"/>
<script type="text/javascript" src="js/abc_qaa.js"/>"""
And:
div_attr_info = {"data-js":"jquery-1.8.1.min.js, jquery-ui-1.2.min.js, custom1.js, custom2.js"}
Output:
for script_ref1
doc_js: [' custom1.js', ' custom2.js']
for script_ref2
doc_js: [' jquery-ui-1.2.min.js', ' custom1.js', ' custom2.js']
Algorithm:
- Set
jquery_flag
andjquery_ui_flag
flag toFalse
. - Extracting specific src attributes from script tags.
- Iterate on step2 result.
- Set
jquery_flag
to True becausejquery
is substring in the file list. - Check
ui
is substring or not. and setjquery_ui_flag
according to it. - Get JS file list from the dictionary for comparison.
- Iterate and Do comparison.
- Print result.
Can the code be improved or optimized? Can I reduce the runtime?
jquery_flag = False
jquery_ui_flag = False
for ii in re.findall('src="js/([^"]*jquery[^"]*)"', script_ref):
jquery_flag = True
if "ui" in ii:
jquery_ui_flag = True
tmp = div_attr_info["data-js"].split(",")
doc_js = []
for ii in tmp:
if jquery_ui_flag and "jquery" in ii:
pass
elif jquery_flag and not jquery_ui_flag and "jquery" in ii and "ui" not in ii:
pass
else:
doc_js.append(ii)
print "doc_js:", doc_js
2 Answers 2
Rather than performance, I suggest tackling readability first.
- Put code in a function: Figure out what it is that the code is trying to do, and package it in a function. This also forces you to assign a name that summarizes its purpose. It also forces you to define what the inputs and outputs are. A docstring would also be tremendously helpful.
- Use better variable names: There is always a more descriptive name than
tmp
. The dummy iteration variableii
is weird too — is it some kind of integer? Also, variables namedsomething_flag
tend to indicate excessively procedural thinking. Avoid procedural loops: In Python, loops of the form
some_list = [] for element in iterable: if condition(element): some_list.append(element)
are better expressed using list comprehensions:
some_list = [element for element in iterable if condition(element)]
Also, loops of the form
flag = False for element in iterable: if condition(element): flag = True break
are better written using
any()
.Use regular expressions effectively: Instead of
"jquery" in ii and "ui" not in ii
, use a regular expression with a negative lookahead assertion.div_attr_info["data-js"].split(",")
is a bit sloppy, in that it leaves leading spaces (but not consistently). Usere.split(r',\s*', ...)
instead.
Suggested solution
import re
def deduplicate_jquery(want_scripts, have_scripts=[]):
"""
Return a copy of want_scripts, omitting any redundant mentions of "jquery"
and "jquery-ui" if they are already listed in have_scripts.
"""
jquery_ui = re.compile(r'jquery-ui\b')
jquery = re.compile(r'jquery(?!-ui)\b')
have_jquery_ui = any(jquery_ui.match(s) for s in have_scripts)
have_jquery = have_jquery_ui or any(jquery.match(s) for s in have_scripts)
return [
want for want in want_scripts if
not (have_jquery_ui and jquery_ui.match(want)) and
not (have_jquery and jquery.match(want))
]
doc_js = deduplicate_jquery(
re.split(r',\s*', div_attr_info["data-js"]),
[m.group(1) for m in re.finditer(r'src="js/([^"]*)"', script_ref)]
)
print 'doc_js:', doc_js
-
\$\begingroup\$ Thank you, I will update code. and again past in question. \$\endgroup\$Vivek Sable– Vivek Sable2015年06月11日 10:45:18 +00:00Commented Jun 11, 2015 at 10:45
Since your post is a little unclear to me, I'm just going to point out a few small things that could be improved quiet a bit.
- The program logic could be separated out into different functions. This allows for easy re-usability.
- You should flesh out this program with comments for clarity. There are two types of comments, regular comments, and docstrings. Regular comments start with a
#
character, followed by a description. Docstrings describe functions and classes, like so.
def my_function(args):
"""
This is a docstring. Put a description
of your function here.
"""
...
class MyClass(object):
"""
Put a description of your class here.
"""
...
- You have some bad variable names, like
ii
, ortmp
. Variable names should generally describe what the variable's purpose is.
There's not much for me to review, but if you want me to cover anything else, just mention it in the comments, and I'll see if I can cover it.