I know string slicing and indexing is fairly straight forward but I can't seem to make my code work here. Sorry I am a newbie and just learning!
I am trying to check if each item in a list (called "lines") contains a certain string. The strings are pulled from another list (called "suffixes"), and I want to return an index, so I can replace the first character, a white space, with dash "-".
However the str.find method is returning -1 in most cases, meaning the string is not found, except in one case where it returns 43 when the first string in "suffixes" is found in an item in "lines".
Example output:
Acephate Butachlor Cycloate Dimethoate (Sum) -1
Aldicarb Captan (Sum) Cyprodinil Disulfoton -1
Aldicarb (Sum) Carbaryl Cyromazine Disulfoton (Sum) -1
Amitraz Carboxine DDT (Sum) Dodemorph -1
Azamethiphos Chlorantraniliprole Deltamethrin Endosulfan (A+B+Sulf) -1
Azinphos-ethyl Chlordane Demeton Endosulfan Alfa 43
Azinphos-methyl Chlordane Trans Demeton-S-methyl-sulfone Endosulfan Beta -1
I suspect it is ONLY searching for the first, but I have followed the syntax I found in multiple places, so I can't see why.
lines = ['', 'Abamectin Buprofezin Cyazofamid Dimethoate', '', 'Acephate Butachlor Cycloate Dimethoate (Sum)', '', 'Acequinocyl Butocarboxim Cycloxydim Dimethomorph', '', 'Acetamiprid Butralin Cyflufenamid Diniconazole', '', 'Acetochlor Cadusafos Cyfluthrin Dinocap', '', 'Acrinathrin Captafol Cymoxanil Dinotefuran', '', 'Alachlor Captan Cyproconazole Diphenylamine', '']
"""if there are any suffixes, then join to the preceeding word with a dash, so then can split data by spaces"""
suffixes=[" Alfa"," Beta"," Sulfate"," sulfoxide"," (DCPA)"," (Sum)"," (Folpet)"," sulphone"," butoxide"," Methyl"," (A+D)"," (THPI)"," (A+B+Sulf)"]
for line in lines:
if any(suffix in line for suffix in suffixes):
print(line, line.find(suffix))
ind=line.find(suffix)
line[ind].replace(' ','-')
Once I have joined some of the words with their suffixes, using a "-", then I will split the rest of the items in "lines" into new items splitting by white space.
The issue I am facing: If I any of the strings in "suffixes" are found (note, each has whitespace at the start of the string) as a sub-string to the items in the list "lines", I want the index to be returned. This is not happening currently. Instead the output is just showing one case of where the first string in "suffixes" is found and the loop is finishing.
If I add the line: if index != -1: print(line,line.find(suffix))
Then my expected output would be something like:
Acephate Butachlor Cycloate Dimethoate (Sum) 38
Azamethiphos Chlorantraniliprole Deltamethrin Endosulfan (A+B+Sulf) 56
etc....
Edit: Although my problem has been solved another way I would like to understand why my code is not returning the index as I want.
-
1what is the actual issue, I read your question 2 timesKunal Mukherjee– Kunal Mukherjee2019年04月08日 15:03:40 +00:00Commented Apr 8, 2019 at 15:03
-
1It would be helpful if you posted the expected result of the algorithm for your sample lines.Lee Jenkins– Lee Jenkins2019年04月08日 15:04:38 +00:00Commented Apr 8, 2019 at 15:04
-
Your input and output dont matchKunal Mukherjee– Kunal Mukherjee2019年04月08日 15:10:49 +00:00Commented Apr 8, 2019 at 15:10
-
I will edit my question to clarify the issue and include some expected results.Westworld– Westworld2019年04月08日 15:39:50 +00:00Commented Apr 8, 2019 at 15:39
1 Answer 1
There's no need for indexing, you can just try the replacement. If the suffix isn't present, then it just won't be replaced.
for suffix in suffixes:
lines = [line.replace(suffix, suffix.replace(" ", "-")) for line in lines]
You may also being having problems with the case. You have "Ethyl" in your list of suffixes, but "ethyl" in the output.