You Should Use Pandoc
Jul 11, 2024
Pandoc bills itself as āa universal document converter.ā Thatās what it is. You can find installation instructions here.
To be more specific, Pandoc is a command-line program (i.e.Ā you run it in the terminal) for converting documents between various file formats. Hereās some things I use Pandoc for:
- Generating PDF reports from Markdown
- Generating Google Docs from Markdown
- Converting Markdown articles to HTML for my website 1
- Website templating and HTML generation
- Performing transformations on HTML documents (e.g.Ā link rewriting)
As you can see, I prefer to use Markdown as a source format and convert to other formats, but Markdown can go both ways for most of the supported formats.
For people who write documents
(i.e., everyone.)
Without Pandoc, you have basically two options for writing serious documents that become PDFs or another typeset format:
- Write it in LaTeX and use a LaTeX engine to generate a PDF
- Write it in a word processor like Microsoft Word, Apple Pages, or Google Docs, and print a PDF
These are both fine most of the time. But if you happen to be a STEM student who needs to typeset mathematical or scientific writing, you pretty much have to use LaTeX. LaTeX is still a fine option but itās pretty ugly, archaic, and limited. What if you want to generate an HTML file for a blog post as well as a PDF? Itās pretty awkward to do so in LaTeX alone. Pandoc enables this: use LaTeX as a source format and HTML as an output. It also allows you to write in Markdown and generate LaTeXā¦ or even skip the middleman and use your LaTeX engine to generate a PDF in one step. I find that Markdown is a better source format most of the time you are dealing with technical content, but even when it isnāt, Pandoc is still a great tool. Pandoc lets you write in a number of nicer formats and still generate a beautiful typeset PDF.
How to use
You can try it online here or you can download it for your system and then run:
pandoc --from markdown --to html5 article.md
where article.md
is the filename of your Markdown
source. This will generate a article.html
file.
Or, to generate a PDF, assuming you have LaTeX already installed, try this:
pandoc --from markdown --to html5 article.md
Maybe you donāt like the look of the output. Indeed I think the defaults are not great. Peruse the many options for more precise control over the generated document. Yes, Iām telling you to RTFM.
ā¦or you can try my command and see if you like it.
Some defaults
Hereās what I used to generate the PDF of my short stories for my writing workshop, using a Markdown source.
Put this YAML header at the top of your Markdown file (or figure out how to put these directives in the command):
title: My Title
author: My Name
date: 11 July 2024
documentclass: article
fontfamily: mathptmx
linestretch: 2
indent: 4m
pointsize: 14p
geometry:
- margin=1in
header-includes:
- \renewcommand{\rule}[2]{\begin{center} * * * \end{center}}
Then generate the PDF using this command (assumes you have
pdflatex
; you could also use xelatex
or
lualatex
but I cannot vouch that my YAML will work with
those):
pandoc article.md -f markdown-latex_macros -t pdf --pdf-engine=pdflatex -o article.pdf
That should get you started. Now play with it!
Markdown caveats
Markdown has many different āflavors,ā which is a euphemism for saying nobody can agree on what syntactic forms are permitted in Markdown. Every Markdown to HTML converter obeys different rules. Fortunately, Pandoc supports pretty much everything through its extensions. Most of the time Pandoc will be tolerant, but you might have to manually enable features like special syntax for tables. Again, RTFM.
For developers
Pandoc is also a Haskell library. You can do some fabulous things with it if you know Haskell.
This page was written in Markdown and converted to HTML using Pandoc.ā©ļø