The Two-column Resume and ATS systems: The Challenges of Parsing a PDF File

mins read

What is an ATS ?

An ATs or Applicant Tracking System is a sofwtare used by organizations to manage their recruitmeent processes.

It handles all the workflow from resume collection to job postings and communications with applicants.

One aspect that often concerns job seekers is the resume parsing feature, which pre-screens candidates. Many applicants worry that their resumes might be rejected if the formatting is not properly recognized by the ATS.

Can my resume layout interfere with ATS ?

Well it is actually difficult to know. ATS are closed systems and their codebase is not public, making it challenging to determine if a resume's format or layout affects its ranking. Though it would be very damaging for a company to reject a perfectly suited resume in case it is not parsed correctly by the system..

So should I do something in particular ? Maybe keep my resume in a single column ?

The most basic ATS parsing tool looks for specific keywords in a resume. To be honest, unless your resume is a plain text file, many factors can alter the ATS software's ability to parse it properly. And this is true even if your resume is made of a single column. For example, extra letter spacing in sections titles can prevent the ATS recognizing these titles as words.

But let's do a quick test

Our test

In order to have a better idea if a two column PDF resume is difficlut to parse, we have made a quick test with various PDF parsers.

We will submit a mock PDF resume to each of these parsers and check if the resulting plain text :

  • has every word from the original resume and doesn't have extra content than the original resume.
  • some words don't end up in different resume sections than their original section (for example a certification ending in the languages section).

The PDF resume we are using for the test

Tool 1: PDFTOTEXT

The pdftotext utility in Ubuntu / Linux relies on the Poppler library, which is an open-source PDF rendering library. Poppler itself is based on Xpdf, an older PDF viewer and library.

It is an old library yet very popular and super fast.

Experience
Senior Engineer
Google - San Francisco

June 2019 - Now

• Led a team to overhaul the backend messaging system based on

ARTHUR
WALKER

Scala Software
Engineer
Contact
Address :
Crocker, San Francisco, USA
Phone :
+1 (555) 555 5555
Email :
arthur.walker@dummym.com

Languages
English
Native
Japanese Started learning
in 2020

Links
LinkedIn
Github

Skills

Apache Kafka. This involved developing new components and
architectures, optimizing existing systems, and ensuring that the
system is secure and performant.
• Responsible for designing and developing data pipelines and APIs,
maintaining uptime and scalability, and providing technical guidance
and support to other teams.

Engineer - Scala & Kafka
Twitter

January 2015 - May 2019

• Developed and deployed microservices using Scala , Go in a team
of 4 coders
• Analyzed the existing systems and identified areas of improvement
• Developed and deployed high-availability and fault-tolerant services
• Collaborated with other teams to ensure smooth delivery of
services, and adapt microservices REST routes

Backend Engineer
Freshworks

Jan 2011 - Nov 2014

• Leveraged Google Cloud Platform to manage a Kafka cluster
deployed in Kubernetes and managed various aspects of the cluster,
such as scaling, security, and performance.
• Developed custom connectors using Kafka Connect API to integrate
with other Google services such as BigQuery and Dataproc.

Freelance Ruby On Rails coder
Self employed

Dec 2006 - Aug 2010

For various clients :
• Developed and maintained Ruby on Rails web applications to
enable users to store and access data on demand
• Designed and implemented a RESTful API utilizing Ruby on Rails
and Postgresql to integrate with existing systems
• Leveraged TDD to ensure the quality and correctness of the code
• Created custom rake tasks to automate routine tasks
• Collaborated with other developers to ensure the application was
up to industry standards

Scala
Apache Kafka

Education

Java
Golang

BS in Computer Science

Ruby on Rails

UC Berkeley - Berkeley

Mongo DB
HTML5 / CSS3

2001 - 2005



As you can see :

  • Every word is present
  • The content of the two columns is mixed up. As you can see the resume starts with Experience. It has parsed the main column first. But from the first line of the applicant most recent work expeirence switches to the side column data.

Tool 2: Apache Tika 2.6.0

Apache Tika is a content analysis toolkit that detects and extracts metadata and text from various document types (not only PDF).

ARTHUR
WALKER

Crocker, San Francisco, USA

+1 (555) 555 5555

arthur.walker@dummym.com

English Native  
Japanese Started learning

in 2020
 

LinkedIn
Github

Google - San Francisco

Twitter

Freshworks

Self employed

Scala Software
Engineer

Contact

Address :

Phone :

Email :

Languages

Links

Skills

Scala
Apache Kafka
Java
Golang
Ruby on Rails
Mongo DB
HTML5 / CSS3

Experience

Senior Engineer
June 2019 - Now

• Led a team to overhaul the backend messaging system based on
Apache Kafka. This involved developing new components and
architectures, optimizing existing systems, and ensuring that the
system is secure and performant. 
• Responsible for designing and developing data pipelines and APIs,
maintaining uptime and scalability, and providing technical guidance
and support to other teams.

Engineer - Scala & Kafka
January 2015 - May 2019

• Developed and deployed microservices using Scala , Go in a team
of 4 coders
• Analyzed the existing systems and identified areas of improvement
• Developed and deployed high-availability and fault-tolerant services
• Collaborated with other teams to ensure smooth delivery of
services, and adapt microservices REST routes

Backend Engineer
Jan 2011 - Nov 2014

• Leveraged Google Cloud Platform to manage a Kafka cluster
deployed in Kubernetes and managed various aspects of the cluster,
such as scaling, security, and performance.
• Developed custom connectors using Kafka Connect API to integrate
with other Google services such as BigQuery and Dataproc. 

Freelance Ruby On Rails coder
Dec 2006 - Aug 2010

For various clients :
• Developed and maintained Ruby on Rails web applications to
enable users to store and access data on demand
• Designed and implemented a RESTful API utilizing Ruby on Rails
and Postgresql to integrate with existing systems
• Leveraged TDD to ensure the quality and correctness of the code
• Created custom rake tasks to automate routine tasks
• Collaborated with other developers to ensure the application was
up to industry standards

Education

BS in Computer Science
UC Berkeley - Berkeley 2001 - 2005

mailto:arthur.walker@dummym.com
https://www.linkedin.com/
https://www.github.com/

Here with Tika:

  • Every word is present in the final text
  • The content of the two columns is mixed up. Though it isq slightly better than with the Poppler library. The parser starts by showing the data from the side column. It also parses data that is hidden in the Poppler library such as email address or links' Urls. And Experience and Education sections content is not mixed up with another section.

Apache tika is giving better results than Poppler especially considering the main sections Experience and Education where content is not mixed up with another column and their content reproduced faithfully. In case an ATS is using a semantic analyzer it is more likely to produce good results.

Tool 3: PyPDF2

PyPDF2 is also a popular tool but is giving mixed results on our resume. Experience section content is preserved yet it seems to have difficulties parsing sections with shorter contents such as lists. Also it is not respecting the flow as good as previous tools. Exerpt :

ARTHUR
WALKER
Crocker, San Francisco, USA
+1 (555) 555 5555
arthur.walker@dummym.com
English Native  
JapaneseStarted learning
in 2020 
LinkedIn
GithubGoogle -San F rancisco
Twitter
Freshworks
Self emplo yed
Scala Softwar e
Engineer
Contact
Address :
Phone :
Email :
Languages
Links
Skills
Scala

Tool 4: Parsr

Parsr is a tool based on PDFMiner and uses some OCR tools on top of it, such as Tesseract. It is developed by french company AXA and can be found here: https://github.com/axa-group/Parsr

It is a modular and highly customizable tool making it slightly more difficult and probably less versatile than other tools tested above. It was difficult to get consistent results for our resume and the best results we could obtain have not exceeded TIKA or PDFTOTEXT.

Conclusion

Parsing PDFs is notoriously tricky. Most parsers struggle to interpret documents like a human would. However, three out of four parsers can still display every word as it appears on the original resume. Impressively, all words from the Experience and Education sections are faithfully reproduced, and one parser, Tika, even manages to capture both sections in full.

With advanced tools like the ADOBE API and recent breakthroughs in Artificial Intelligence, I'm confident that soon, any resume will be parsed accurately and without errors.

At CVdunk, we're all in on two-column resumes! This format fills some gap, offering unique advantages and potentially being the perfect fit for many candidates. Create your resume today with CVdunk and get the best of our single and two-columns resumes templates!