I have written this script in Python 2.7 and I was wondering where I could improve it or efficient. This is my first script so I don't now much. The script takes a text file from a folder removes blank spaces with list comprehension and then breaks it into smaller 20 elements and saves it each 20 elements as a definition into a dictionary with iterative numbered keys. Each key is then used to fetch the definition from the dictionary and sends each through the API one by one saving the output into another dictionary. The result dictionary is then converted to text and saved into a text file.
import glob
import os
import json
import paralleldots
api_key = "iyuDsKbgZSsCxVj6Moe37MzdqCUuvkpf33t6qS3X3cH"
paralleldots.set_api_key(api_key)
output = input(r"C:\Users\User\Desktop\trial")
txt_files = os.path.join("D:\\english\\stou\\check", '*.txt')
dic = {} #dictionary for storing list elements
res = {} #results dictionary for api output
def for_high (list_e): #defining a function for taking list elements and putting them into dic
no_keys = list_e / 20
dic = {
1: l[1:21]
}
i = 1
start_key_value = 21
while i <= no_keys:
i += 1
end_key_value = start_key_value + 20
dic.update({i: ''.join(l[start_key_value:end_key_value])})
start_key_value += 20
for x in dic: #creating a for loop for getting and saving the output and creating a new file
output = paralleldots.emotion(dic.get(x))
res.update({x: output})
with open(os.path.join("C:\\Users\\User\\Desktop\\trial", filename), 'w') as text_file:
text_file.write(json.dumps(res))
for txt_file in glob.glob(txt_files): # for loop for going through all the text files in the input directory
with open(txt_file, "r") as input_file:
filename = os.path.splitext(os.path.basename(txt_file))[0] + '.txt'
l = [l for l in input_file.readlines() if l.strip()]
list_e = int(len(l)) #no. of list elements variable
if list_e > 20: #checking if list has more than 20 elements
if list_e % 2 != 0: #checking if list has an odd no. of elements
list_e += 1
for_high(list_e)
else:
for_high(list_e)
else:
in_txt = paralleldots.emotion(l)
filename = os.path.splitext(os.path.basename(txt_file))[0] + '.txt'
with open(os.path.join("C:\\Users\\User\\Desktop\\trial", filename), 'w') as text_file:
text_file.write(str(in_txt))
-
1\$\begingroup\$ Please capitalize correctly. It makes it much easier to read your post. \$\endgroup\$FreezePhoenix– FreezePhoenix2018年08月09日 16:56:44 +00:00Commented Aug 9, 2018 at 16:56
-
1\$\begingroup\$ @FreezePhoenix Edited :) \$\endgroup\$Ziyad– Ziyad2018年08月09日 17:00:31 +00:00Commented Aug 9, 2018 at 17:00
1 Answer 1
I understand you are a beginner and you had the guts to post your code. You deserve some feedback and I will try.
Your code is hard to read
This is most probably why you didn't get answers till now. There is code repetition, global and reused variables, no clear control flow, and bad variable names. It is really hard to find the core functionality. So I start in no particular order.
Bad names
dic = {} #dictionary for storing list elements
Well I can see this is a dict. Bad name, bad comment. Use names describing the meaning, not the type. Comments shoud describe the hidden, not so obvious parts, that could not be expressed by names. Your whole comment #dictionary for storing list elements
can be split into #dictionary for storing ... elements
and ... lists ...
.
the first one is completely obvious, the other is just some type information again. No hints to the sort of content (should be in the name), no hints as to the key.
You could safely replace the whole line with guesswhat={}
.
Do not use globals or nonlocals for no reason
def for_high (list_e): #defining a function for taking list elements and putting them into dic
Uses filename
, res
, and l
. While one could expect that on res
and as it was declared at global scope, it is a complete no-go to use variables that look like temporaries in their scope. This renders your function absolutely untestable and your function will easily break if you alter the outer code. Pass required variables, so the reader/maintainer is aware they are used by the function.
Unlike what the comment suggests, your function currently only takes a list length. And unless for_high
is a well known / defined term in your community, it is a bad name again.
Do not repeat yourself (DNRY)
Your function has some code:
output = paralleldots.emotion(dic.get(x))
res.update({x: output})
with open(os.path.join("C:\\Users\\User\\Desktop\\trial", filename), 'w') as text_file:
text_file.write(json.dumps(res))
which is duplicated outside (slightly modified)
in_txt = paralleldots.emotion(l)
filename = os.path.splitext(os.path.basename(txt_file))[0] + '.txt'
with open(os.path.join("C:\\Users\\User\\Desktop\\trial", filename), 'w') as text_file:
text_file.write(str(in_txt))
How can something be either output
or in_txt
? Of course a function has a return value or output or result. None of these make a good name. What is returned?
For one code snippet filename
is explicitly set, for the other it magically pre-exists? So the function accesses a value from outside and the outer code overwrites the existing value with the very same value before using it.
This is pure horror. Also one of this file writes is done in a loop, the file is overwritten in each iteration. It shall either be written once or appended.
Create a function handling one element.
Simplify control flow
If both execution paths of an if
clause nearly do the same, check whether you can simplify it. Your
if list_e % 2 != 0: #checking if list has an odd no. of elements
list_e += 1
for_high(list_e)
else:
for_high(list_e)
can be simplified to (again add a useful comment):
if list_e % 2 != 0: # list length may not be odd because ...
list_e += 1
for_high(list_e)
but: are you sure it is a good idea to increase a list-length variable while the list itself stays odd?
Define numeric constants once
you have
no_keys = list_e / 20
dic = {1: l[1:21]}
start_key_value = 21
end_key_value = start_key_value + 20
start_key_value += 20
if list_e > 20: #checking if list has more than 20 elements
I do not think these are independent constants. So define this constant once and use that name throughout the code (again change the comment to have some meaning)
max_emoticon_len = 20
no_keys = list_e / max_emoticon_len
dic = {1: l[1:max_emoticon_len + 1]}
start_key_value = max_emoticon_len + 1
end_key_value = start_key_value + max_emoticon_len
start_key_value += max_emoticon_len
if list_e > max_emoticon_len: #checking if list has more than elements than desired to fit for ...
Other
Also list_e
is a bad name. e
mostly is used as name/pre-/postfix for elements for e in l:
. for no. of elements the term length
as in len()
is used. stick to those conventions to avoid irritation.
Also dic = {1: l[1:21]}
- are you sure you want to skip the first element? In Python, indexing starts at 0.
Also is it desired, that when having a list of length 40, the first sequence has length 20, and the last one has length 19?
Also dic = {1: l[1:21]}
hides the outer dic which therefor is never used and may be deleted
Also dic = {1: l[1:21]}
- why do you set this explicitly? Why not inside the loop?
Also dic.update({i: 'some'})
may be done by dic[i] = 'some'
. The same goes for res.
Also you handle lists with less than 20 elements in your main code. However, if the list is bigger than 20 elements, you do batches of 20 and drop the last ones. Desired?
-
\$\begingroup\$ I did my best to edit this post to improve its quality (spelling and grammar wise). I understand you may not be a native English speaker (neither am I), but please at least do a quick spell-check before posting. Thanks. \$\endgroup\$Daniel– Daniel2018年08月10日 12:55:19 +00:00Commented Aug 10, 2018 at 12:55