Finding lines in .txt file that contain keywords from two different set()

A S aishan0403 at gmail.com
Sun Sep 8 12:02:55 EDT 2019


My problem is seemingly profound but I hope to make it sound as simplified as possible.....Let me unpack the details..:
1. I have one folder of Excel (.xlsx) files that serve as a data dictionary.
-In Cell A1, the data source name is written in between brackets
-In Cols C:D, it contains the data field names (It could be in either col C or D in my actual Excel sheet. So I had to search both columns
-*Important: I need to know which data source the field names come from
2. I have another folder of Text (.txt) files that I need to parse through to find these keywords.
These are the folders used for a better reference ( https://drive.google.com/open?id=1_LcceqcDhHnWW3Nrnwf5RkXPcnDfesq ). The files are found in the folder.
This is the code I have thus far...:
import os, sys
from os.path
import join
import re
import xlrd
from xlrd import open_workbook
import openpyxl
from openpyxl.reader.excel import load_workbook
import xlsxwriter
#All the paths
dict_folder = 'C:/Users/xxxx/Documents/xxxx/Test Excel' 
text_folder = 'C:/Users/xxxx/Documents/xxxx/Text'
words = set()
fieldset = set()
for file in os.listdir(dict_folder):
if file.endswith(".xlsx"):
 wb1 = load_workbook(join(dict_folder, file), data_only = True)
 ws = wb1.active
 #Here I am reading and printing all the data source names set(words) in the excel dictionaries:
 cellvalues = ws["A1"].value
 wordsextract = re.findall(r"\((.+?)\)", str(cellvalues))
 results = wordsextract[0]
 words.add(results)
 print(results)
 for rowofcellobj in ws["C" : "D"]:
 for cellobj in rowofcellobj:
 #2. Here I am printing all the field names in col C & D in the excel dictionaries:
 data = re.findall(r"\w+_.*?\w+", str(cellobj.value))
 if data != []:
 fields = data[0]
 fieldset.add(fields)
 print(fieldset)
 #listing = str.remove("")
 #print(listing) 
#Here I am reading the name of each .txt file to the separate .xlsx file:
for r, name in enumerate(os.listdir(text_folder)):
 if name.endswith(".txt"):
 print(name)
#Reading .txt file and trying to make the sentence into words instead of lines so that I can compare the individual .txt file words with the .xlsx file 
txtfilespath = os.chdir("C:/Users/xxxx/Documents/xxxx/Text")
#Here I am reading and printing all the words in the .txt files and compare with the excel Cell A1:
for name in os.listdir(txtfilespath):
 if name.endswith(".txt"):
 with open (name, "r") as texts:
 # Read each line of the file:
 s = texts.read()
 print(s)
 #if .txt files contain.....() or select or from or words from sets..search that sentence and extract the common fields
 result1 = []
 parens = 0
 buff = ""
 for line in s:
 if line == "(":
 parens += 1
 if parens > 0:
 buff += line
 if line == ")":
 parens -= 1
 if not parens and buff:
 result1.append(buff)
 buff = ""
 set(result1)
#Here, I include other keywords other than those found in the Excel workbooks 
 checkhere = set() 
 checkhere.add("Select")
 checkhere.add("From")
 checkhere.add("select")
 checkhere.add("from")
 checkhere.add("SELECT")
 checkhere.add("FROM")
 # k = list(checkhere)
 # print(k) 
 #I only want to read/ extract the lines containing brackets () as well as the keywords in the checkhere set. So that I can check capture the source and field in each line:
 #I tried this but nothing was printed......
 for element in checkhere:
 if element in result1:
 print(result1)
My desired output for the code that could not be printed when I tried is:
(/* 1.select_no., biiiiiyyyy FROM apple_x_Ex_x */ 
 proc sql; "TRUuuuth")
(/* 1.xxxxx FROM xxxxx*/ 
proc sql; "TRUuuuth")
(SELECT abc AS abc1, ab33_2_ AS mon, a_rr, iirir_vf, jk_ff, sfa_jfkj
 FROM &orange..xxx_xxx_xxE
 where (asre(kkk_ix as format 'xxxx-xx') gff &bcbcb_hhaha.) and 
 (axx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.)
 )
 (/* 1.select_no. FROM apple_x_Ex_x */ 
 proc sql; "TRUuuuth")
 (SELECT abc AS kfcccc, mcfg_2_ AS dokn, b_rr, jjhj_vf, jjjk_hj, fjjh_jhjkj
 FROM &bfbd..pear_xxx_xxE
 where (afdfe(kkffk_ix as format 'xxxxd-xx') gdaff &bcdadabcb_hdahaha.) and 
 (axx(xx_ix as format 'xxxx-xx') lec &jgjsdfdf_vnv.)
 )
After which, if I'm able to get the desired output above, I will then compare these lines against the word set() and the fieldset set().
Any help would really be appreciated here..thank you


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /