Portable document format (PDF) files can be found all over the internet, used to distribute everything from company reports to tax forms. They're easy to display and print on all sorts of devices and to transfer by Web or email. But when it comes to certain operations, especially spreadsheet operations, PDFs can be difficult to work with. Luckily, there are tools to convert data from PDFs into either the comma-separated value format usable by many spreadsheet programs, known as CSV, or into Microsoft Excel Files.
Convert a PDF to CSV
If you receive a PDF with data in a table format in it, you'll often want to run various kinds of analyses on that data. You might want to sum some of the columns in a spreadsheet, compare the information to other data you have or plot it on a bar chart or line graph.
Unfortunately, it's not easy to do that directly from a PDF file. But if you convert the PDF to a CSV file, you can import it into a spreadsheet tool, a database program or many other analysis tools. There are a number of free and paid tools available online and offline to use for PDF to CSV conversion.
Consider Optical Character Recognition
In some cases, especially if the PDF is created from a scanned document, it may include only an image of the text, not the raw characters themselves in a way a computer can understand. In this case, you may need to run the program through an optical character recognition (OCR) program that can process the text as individual words or numbers.
OCR programs aren't perfect, so it's a good idea to double-check any text or numbers you extract from a PDF this way.
Use Adobe Acrobat
One tool that can convert PDF files to spreadsheets is Adobe Acrobat. Adobe is the company that developed much of the PDF format. You can open a PDF file in the paid version of Acrobat and export to a Microsoft Excel spreadsheet file. If you prefer a CSV, Excel or most other spreadsheet programs can open the file and save it as a CSV.
To convert a file, open it in Acrobat and click "Export PDF." Choose spreadsheet and "Microsoft Excel Workbook" as the output format. Click "Export" and choose where to save the file. If the PDF is scanned, Acrobat will run OCR technology to extract the text.
You do have to pay to use Acrobat, though there is a free trial available.
Use an Online Tool
There are a number of free online tools that can convert PDF to CSV or to spreadsheet files, including running OCR software on them if necessary.
A program called Convertio will convert PDFs to CSV files. Many conversions are free, but you may have to pay for services like OCR or processing large files.
Another option is SodaPDF, which is available for free and can convert PDFs into Excel files, Microsoft Word format or Microsoft PowerPoint documents. It's available for free online, and you can upload a PDF and download the file it generates after it runs the conversion process.
One downside to using an online tool is that you must share the file you're converting with whoever operates the tool. You may not wish to do this if the document is confidential.
Use an Offline Tool
There are also tools that you can use offline to convert a PDF to a CSV file or to other, potentially convenient formats.
One is called Tabula, and it's available for free for Windows, Mac or Linux. It doesn't include OCR capability, so it can't work with scanned PDFs that don't contain embedded text.
There's also an open-source tool called pdf2csv that works with the Python programming language. It's available for free on the open-source program repository GitHub.
You can also use a free tool called PDFMiner that can convert PDFs to text or a tool called PDF2HTML that will convert PDF to hypertext markup language Web page files you can edit as text or view in a Web browser.