Wouldn’t it be great if you could import data directly from PDFs?
Well actually, with Power BI you already can! Power BI natively supports importing data from PDF files. Follow along and I’ll show how you can easily create a simple report from data stored in a PDF file.
For today’s post, I’ve sourced Nestle Groups 2017/2018 financials, which are stored in PDF format. Why did I pick Nestle? Simple, they were the first Google result when I searched for company financials PDF.
The PDF data we are starting with is shown in the below screenshot. It’s in table format, but not something we could easily copy/paste into Excel or other tools for further analysis. With Power BI’s Get and Transform tool (Power Query), we’ll convert this into an easy-to-use, structured table format.
We’ll then create some visualizations of this structured data. By the end of this post, you should be able to create your own report from PDF data, similar to interactive, embedded one below:
To begin loading and transforming data from PDF files, open Power BI desktop and select Get Data:
Next, select the PDF data type from the “Get Data” window:
Now Power BI will connect to the PDF file. From here, I am selecting Table005, as this is the table with the income statement data from page 6 of the PDF. I click the checkbox near this table (1) and then click Edit (2):
Now the Power Query edit query window opens. With a few applied steps (shown on the right), you can edit the structure of the data to fit your needs. For my requirements, I wanted year, amount and income statement item as columns. If you want to view the exact data transformation steps I’ve done, you can download my Power BI desktop file used for this blog post.
Once the data transformation steps are complete, click “Close & Apply”. Power BI will then bring the data into your data model:
Now that we have the data in a ready-to-use structure and format of columns and rows (one nice table), it becomes very easy to build some basic visuals and slicers for our end report!
And there we have it! In this post we showed how you can connect to data locked up in PDF files, transform the data into a usable structure, and created a few visuals to make this report ready for business consumption.
You can view the interactive report on the PBI web service, or download and play around with my Power BI desktop file for free. I’ve uploaded the desktop file here: https://github.com/zachrenwick/ImportPDFDataPowerBI