Newest 'tabula-py' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

134 questions

0 votes

0 answers

19 views

tabula-py read_pdf not running

I am trying to use tabula-py's read_pdf function to read data from a pdf. import tabula path = "./numbers.pdf" print("reading") data = tabula.read_pdf(path,pages=[2,3],...

Jack's user avatar

Jack

asked Apr 4, 2025 at 12:46

0 votes

0 answers

235 views

handling complex tables with PyMuPdf

My use case contains textual table data but the column header cell values have multiple lines in them(image shared). which results in bad parsing by PyMuPdf. I have tried Camelot and Tabula as well, ...

Arbaaz Ali's user avatar

Arbaaz Ali

asked Mar 24, 2025 at 16:54

0 votes

0 answers

71 views

Tabula GUI and Tabula-py give a different outcome

I'm trying to extract some data from a pdf table, I used the Tabula.exe app at the beginning and after selecting the wanted area the resulting csv is how I want it. I exported the template and I tried ...

Tosamoon's user avatar

Tosamoon

asked Jan 7, 2025 at 13:59

0 votes

0 answers

86 views

convert PDF file data to dataframe in python?

Is there a way to convert below pdf files data to dataframe ? https://www.onrr.gov/document/2018.pdf https://www.onrr.gov/document/2021.pdf I have used 'tabula-py' to convert above pdfs to dataframe. ...

emiley mille's user avatar

emiley mille

asked Aug 14, 2024 at 12:05

0 votes

1 answer

140 views

Tabula-Py getting confused with column names

I have a pdf that has some text at the top in the first page and then table starts. The table extends throughout the pdf (of 156 pages). I want to extract this table into csv.I have succesfully done ...

Devarapalli Vamsi's user avatar

Devarapalli Vamsi

asked Aug 14, 2024 at 11:16

0 votes

0 answers

119 views

Is there any module other than tabula-py and camelot to extract tables from native pdfs?

Was using tabula-py for extracting tabular information and then storing it in .csv files however it fails to understand the structure of the tableScreenshot of pdf using as a dataset Real structure of ...

SY_C_41_PIYUSH DESHMUKH's user avatar

SY_C_41_PIYUSH DESHMUKH

asked Aug 4, 2024 at 10:37

1 vote

0 answers

253 views

Java not recognized in Python venv on Windows 11

I'm trying to use the tabula-py library in a Python virtual environment on Windows 11. Java is installed on my system, and java -version works outside the venv. However, inside the venv, I get 'java' ...

Andres Aguilar's user avatar

Andres Aguilar

asked Jun 13, 2024 at 1:28

1 vote

0 answers

59 views

Warnings when I use tabula-py

I got these warnings when I use tabula-py. Apr 24, 2024 10:15:55 AM org.apache.pdfbox.pdmodel.PDDocument importPage WARNING: inherited resources of source document are not imported to destination page ...

Mohammed H Abbadi's user avatar

Mohammed H Abbadi

asked Apr 24, 2024 at 7:25

1 vote

0 answers

246 views

Fatal Java error when trying to use Tabula-py

All my current code: import tabula pdfpath= "Testpdfs/HSA certCut.pdf" sbc = tabula.read_pdf(pdfpath, stream=True, pages=4, format="CSV")[0] print(sbc) I have a fresh install ...

user23713303's user avatar

user23713303

asked Mar 21, 2024 at 15:25

2 votes

0 answers

80 views

Tabula- Last line from each page not getting extracted using python

I have a pdf with 4 pages containing 98 rows of tabular data. However when use tabula, last line from each page is getting excluded in the final output. Below is the code: import tabula tabula....

Mukesh Dalmia's user avatar

Mukesh Dalmia

asked Mar 15, 2024 at 20:52

0 votes

0 answers

102 views

Getting broken text while reading pdf written in eastern language in python

I am facing a problem for while. I am working on a project where I have to make a rest api where it will extract texts from pdf and make json data out of it. The pdf format will be same all time. And ...

Muhammad Hossain's user avatar

Muhammad Hossain

asked Mar 8, 2024 at 6:42

0 votes

1 answer

21 views

Is there possible the tabula-py extract numeric 007 as 007 instead 7?

I use tabula-py to extract the pdf table content, the output for numeric as text such as 010019 or 0007 is always convert to float. Is there any way to fix it to return correct value (0007 instead 7....

tabula-py

Ray Ronnaret's user avatar

Ray Ronnaret

asked Mar 8, 2024 at 2:46

1 vote

1 answer

313 views

Encoding Issue When Attempting to Convert Hindi Script PDF to CSV in Python

I'm currently attempting to convert a PDF file containing Hindi Devanagari script to a CSV file using the fitz library in Python, but when I read in the text I encounter a strange encoding issue. Here ...

cedratcarlisle's user avatar

cedratcarlisle

asked Mar 4, 2024 at 23:54

1 vote

0 answers

81 views

PDF scraping, tabula py - columns do not correspond with "true" values of PDF file

I get stuck again with PDF scraping and observe that columns do not correspond to some of the values that I obtain for those columns. Basically, I want to obtain a CSV file, but first I want to ...

Michael Picazo's user avatar

Michael Picazo

asked Nov 28, 2023 at 16:28

0 votes

1 answer

60 views

Keep Leading Zeros in Converted CSV Using Tabular-Py and Pandas

Is there a way to maintain leading zeros in cells while still using the tabula-py convert_into function? Perhaps by passing something into the 'options' parameter to read them as strings? The ...

Nick08's user avatar

Nick08

asked Nov 20, 2023 at 22:20

15 30 50 per page

2 3 4 5

...

9 Next

CollectivesTM on Stack Overflow

tabula-py read_pdf not running

handling complex tables with PyMuPdf

Tabula GUI and Tabula-py give a different outcome

convert PDF file data to dataframe in python?

Tabula-Py getting confused with column names

Is there any module other than tabula-py and camelot to extract tables from native pdfs?

Java not recognized in Python venv on Windows 11

Warnings when I use tabula-py

Fatal Java error when trying to use Tabula-py

Tabula- Last line from each page not getting extracted using python

Getting broken text while reading pdf written in eastern language in python

Is there possible the tabula-py extract numeric 007 as 007 instead 7?

Encoding Issue When Attempting to Convert Hindi Script PDF to CSV in Python

PDF scraping, tabula py - columns do not correspond with "true" values of PDF file

Keep Leading Zeros in Converted CSV Using Tabular-Py and Pandas

Hot Network Questions