Debian GNU/Linux 11 (bullseye), grep (GNU grep) 3.6
I need find string in current directory within all files (doc, docx and pdf), grep command not working for me:
grep -ril "word" .
It doesn't output anything. What's wrong?
1 Answer 1
All three formats need to be converted to text before they can be searched using tools such as grep
.
For "old-style" .doc
files, use catdoc
:
catdoc file.doc | grep word
For OOXML .docx
files, use docx2txt
:
docx2txt < file.docx | grep word
or
docx2txt file.docx - | grep word
For PDF files, use pdfgrep
:
pdfgrep word file.pdf
or pdftotext
:
pdftotext file.pdf - | grep word
If you switch to ripgrep
you can use a preprocessor:
#!/bin/sh -
if [ ! -s "1ドル" ]; then exec cat; fi
case "1ドル" in
*.pdf)
exec pdftotext - -
;;
*.doc)
exec catdoc -
;;
*.docx)
exec docx2txt - -
;;
*)
exec cat
;;
esac
Save this to a file, make it executable (chmod 755
), and use it with --pre
:
rg --pre /path/to/preprocessor word
See the ripgrep
guide for tips on reducing the overhead of the preprocessor.