Skip to main content
Stack Overflow
  1. About
  2. For Teams

Return to Question

I have this large SQL file with about 1 milllion inserts in it, some of the inserts are corrupted (about 6000) with weird characters that i need to remove so i can insert them into my DB.

Ex: INSERT INTO BX-Books VALUES ('2268032019','Petite histoire de la d�©sinformation','Vladimir Volkoff',1999,'Editions du Rocher','http://images.amazon.com/images/P/2268032019.01.THUMBZZZ.jpg','http://images.amazon.com/images/P/2268032019.01.MZZZZZZZ.jpg','http://images.amazon.com/images/P/2268032019.01.LZZZZZZZ.jpg');

i want to remove only the weird characters and leave all of the normal ones

I tried using the following code to do so:

import fileinput import string

fileOld = open('text1.txt', 'r+') file = open("newfile.txt", "w")

for line in fileOld: #in fileinput.input(['C:\Users\Vashista\Desktop\BX-SQL-Dump\test1.txt']): print(line) s = line printable = set(string.printable) filter(lambda x: x in printable, s) print(s) file.write(s)

import fileinput
import string
fileOld = open('text1.txt', 'r+')
file = open("newfile.txt", "w")
for line in fileOld: #in fileinput.input(['C:\Users\Vashista\Desktop\BX-SQL-Dump\test1.txt']):
 print(line)
 s = line
 printable = set(string.printable)
 filter(lambda x: x in printable, s)
 print(s)
 file.write(s)

but it doesnt seem to be working, when i print s it is the same as what is printed during line and whats stranger is that nothing gets written to the file.

Any advice or tips on how to solve this would be useful

I have this large SQL file with about 1 milllion inserts in it, some of the inserts are corrupted (about 6000) with weird characters that i need to remove so i can insert them into my DB.

Ex: INSERT INTO BX-Books VALUES ('2268032019','Petite histoire de la d�©sinformation','Vladimir Volkoff',1999,'Editions du Rocher','http://images.amazon.com/images/P/2268032019.01.THUMBZZZ.jpg','http://images.amazon.com/images/P/2268032019.01.MZZZZZZZ.jpg','http://images.amazon.com/images/P/2268032019.01.LZZZZZZZ.jpg');

i want to remove only the weird characters and leave all of the normal ones

I tried using the following code to do so:

import fileinput import string

fileOld = open('text1.txt', 'r+') file = open("newfile.txt", "w")

for line in fileOld: #in fileinput.input(['C:\Users\Vashista\Desktop\BX-SQL-Dump\test1.txt']): print(line) s = line printable = set(string.printable) filter(lambda x: x in printable, s) print(s) file.write(s)

but it doesnt seem to be working, when i print s it is the same as what is printed during line and whats stranger is that nothing gets written to the file.

Any advice or tips on how to solve this would be useful

I have this large SQL file with about 1 milllion inserts in it, some of the inserts are corrupted (about 6000) with weird characters that i need to remove so i can insert them into my DB.

Ex: INSERT INTO BX-Books VALUES ('2268032019','Petite histoire de la d�©sinformation','Vladimir Volkoff',1999,'Editions du Rocher','http://images.amazon.com/images/P/2268032019.01.THUMBZZZ.jpg','http://images.amazon.com/images/P/2268032019.01.MZZZZZZZ.jpg','http://images.amazon.com/images/P/2268032019.01.LZZZZZZZ.jpg');

i want to remove only the weird characters and leave all of the normal ones

I tried using the following code to do so:

import fileinput
import string
fileOld = open('text1.txt', 'r+')
file = open("newfile.txt", "w")
for line in fileOld: #in fileinput.input(['C:\Users\Vashista\Desktop\BX-SQL-Dump\test1.txt']):
 print(line)
 s = line
 printable = set(string.printable)
 filter(lambda x: x in printable, s)
 print(s)
 file.write(s)

but it doesnt seem to be working, when i print s it is the same as what is printed during line and whats stranger is that nothing gets written to the file.

Any advice or tips on how to solve this would be useful

Source Link
Big_VAA
  • 794
  • 1
  • 6
  • 11

Remove Weird Characters using python

I have this large SQL file with about 1 milllion inserts in it, some of the inserts are corrupted (about 6000) with weird characters that i need to remove so i can insert them into my DB.

Ex: INSERT INTO BX-Books VALUES ('2268032019','Petite histoire de la d�©sinformation','Vladimir Volkoff',1999,'Editions du Rocher','http://images.amazon.com/images/P/2268032019.01.THUMBZZZ.jpg','http://images.amazon.com/images/P/2268032019.01.MZZZZZZZ.jpg','http://images.amazon.com/images/P/2268032019.01.LZZZZZZZ.jpg');

i want to remove only the weird characters and leave all of the normal ones

I tried using the following code to do so:

import fileinput import string

fileOld = open('text1.txt', 'r+') file = open("newfile.txt", "w")

for line in fileOld: #in fileinput.input(['C:\Users\Vashista\Desktop\BX-SQL-Dump\test1.txt']): print(line) s = line printable = set(string.printable) filter(lambda x: x in printable, s) print(s) file.write(s)

but it doesnt seem to be working, when i print s it is the same as what is printed during line and whats stranger is that nothing gets written to the file.

Any advice or tips on how to solve this would be useful

default

AltStyle によって変換されたページ (->オリジナル) /