Web scraping applied to E-invoicing

Published on 15 October 2019

Product Marketing Manager at Generix Group
B2B Collaboration

Invoice digitization set out by directive 2010/45/EU on July 13, 2010 is now democratizing both in France and elsewhere in Europe. With this regulatory obligation, an entire company assistance ecosystem is falling into place. The challenge of the moment: contain the dispersion of provider invoices brought about by digitization. To better respond to this issue, automatic document extraction or web scraping has today become a durable solution. But just how does the technique work and why is it so promising? We go through this innovative option applied to electronic invoicing.

Invoice digitization: managing dispersion

By 2021 or 2022, the consensus seems to point towards all invoices issued in France being provided in an electronic format. This invoice digitization requirement implies that professionals record invoices using a reliable medium and that they be able to present it in the event of a tax inspection.

To meet these requirements and reduce management costs for an outbound invoice, vendors most often create portals where their clients can log in to download issued invoices. As a consequence, both individual customers and professionals are required to log on to multiple sites to recover provider invoices to keep and integrate into their accounting. For companies in general and small or medium companies in particular, such tasks have become quite time-consuming.


Web scraping: an innovative solution

In response to this problem, a solution based loosely on web scraping codes is being developed: automated extraction of invoices at provider publication sites. For companies, this means extracting all the provider invoices involving them from an automated process based on APIs.


How does web scraping work?

Client-side, indicate login and password for each provider application, then configure the necessary actions to program automatic invoice extraction or monthly orders, for example. Extraction automation occurs in real time as downloading takes place, in and of itself a massive time-saver for companies.


Implementation requiring connector creation

To be implemented, this technique requires specific development on behalf of the platforms involved. Connectors must be created and linked with provider applications in order to prepare for extraction. 

From postal services to public and long-distance travel transportation, several hundred connectors may need to be created to allow for invoices to be extracted from this type of portal. Development of this technique is conditioned when provider applications are opened for connector creation.


For further reading: The Case for Interoperable Process Digitization Platforms