s

Working with Jupyter

By Angela C

April 16, 2021 in nbconvert

Reading time: 7 minutes.

In this post I will make some notes on creating content for this blog from Jupyter notebooks. My notebooks usually get too long to use as a post so I want to be able to create a summary post of the notebook and include any plots, images etc.

The following packages are available.

  • Dataframe image is a package to convert Jupyter Notebooks to PDF and/or Markdown embedding pandas DataFrames as images.
  • nbconvert is a package for converting notebooks to other formats such as HTML, markdown, PDF and more.
  • nb2hugo for converting a Jupyter notebook to a Hugo markdown page.

nbconvert

A Jupyter notebook can be easily downloaded in various formats from within the Jupyter Notebook from the File/Download As menu. This will download the notebook in the specified format to the downloads folder. From there the converted notebook can be moved to wherever you want it. However so far this is the easiest way but the least flexible as the entire notebook will be converted.

There are many options that can be used here. One such options is to be able to exclude all code cells or to exclude all code cell inputs or to exclude all cell outputs.

nbconvert can also be used as a library. I have not used this yet but I’ll have a look at this another time.

A Jupyter notebook can also be converted to another format from the command line. This is what I have been using, following the documentation on using nbconvert as a command line tool.

jupyter nbconvert --to FORMAT notebook.ipynb will convert the Jupyter notebook file into the format given by the ‘FORMAT’ string. There are many supported output formats including HTML, PDF, LaTeX and Markdown. For now I am only looking at markdown and HTML formats.

  • --to markdown

  • --to html

  • Multiple notebooks can also be converted together.

Configuration Options

Configuration options may be set in a file or at the command line when starting nbconvert.

Various App options are listed here including the following:

  • NbConvertApp.notebooks to provide a list of notebooks to convert. Default is [].

  • NbConvertApp.output_base to overwrite base name use for output files. This can only be used when converting one notebook at a time. The default is ''.

  • NBConvertApp.output_files_dir to specify the directory to copy extra files (figures) to. The default is that the`‘{notebook_name}’ in the string will be converted to notebook basename.

Exporter Options

These options are set to False by default.

  • TemplateExporter.exclude_code_cell : Bool to exclude code cells from all templates if set to True.

  • TemplateExporter.exclude_input : Bool to exclude code cell inputs from all templates if set to True.

  • TemplateExporter.exclude_output : Bool exclude code cell outputs from all templates if set to True

  • The following code will produce a markdown file without both the code input and without the code output.

jupyter nbconvert --to markdown --TemplateExporter.exclude_input=True --TemplateExporter.exclude_output=True  my_notebook.ipynb

(This is probably the same as setting TemplateExporter.exclude_code_cell to True.)

  • This code here converts the notebook to markdown, does not show the code input cells and additionally specifies the name of the output file.
jupyter nbconvert --to markdown --TemplateExporter.exclude_input=True mynotebook.ipynb --output myconvertednotebook.md

Another option is to remove only certain cells, inputs or outputs instead of all code cells.

See Removing cells, inputs, or outputs.

Cells can be removed by using regular expressions on cell content or by using cell tags. I have used tags. To do this you select Tags from the View / Cell Toolbar and add or edit tags for whatever cells you want to remove from the converted output.

For example this code here removed specific cells that were marked by tags. The tags are provides as a list of strings.

jupyter nbconvert mynotebook.ipynb --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags="['remove_cell','remove_cells','remove']" --to markdown

The same can be applied when converting the Jupyter notebooks to other formats such as HTML by substituting --to markdown with --to html.

I have experimented with converting a few Jupyter notebooks in the Notebooks section of this blog which I will remove again as they are nonsense and just for testing! When a ‘.ipynb’ file is converted to markdown, any plots are saved as images to a folder at the same level. For example if I converted a Jupyter notebook ‘mynotebook.ipynb’ in the Notebooks folder to markdown format, this will result in a markdown file ‘mynotebook.md’ in the same folder as well as folder containing any images for plot output etc ‘mynotebook_files’. So far I have had to edit the image links in the markdown output to go up one level so for example ![png](../mynotebook_files/image_name.png).


The Data Carpentry Reproducible Science Curriculum has a post on exporting Jupyter notebooks. The post suggests specifying the basic template when exporting to HTML format if you want to embed the resulting HTML into a blog post instead of using the default full template which includes headers etc.

jupyter nbconvert my_notebook.ipynb --to html --template basic --output output.html

Jupyter notebooks that are converted to markdown files need to have front matter added to the top to be included in a Hugo post. Otherwise there will be an error when building the site ‘plain HTML documents not supported’


There is also a nb2hugo package by Vincent Lunot for converting a Jupyter notebook to a Hugo markdown page which is based on the nbconvert package. I’m not sure if you can include / exclude code cells with this but otherwise it seems a good option. You need to have a markdown cell at the very top of the notebook that will contain the front matter information.


Printing dataframe tables as images

There a few ways to deal with the problem of a pandas DataFrame table from a Jupyter notebook not displaying in a markdown document. Usually the last line of code is displayed in Jupyter notebook without having to specify print. For example to print the top five rows of a dataframe named df I usually just type df.head(). This prints it nicely formatted (pretty printing I think!) in the notebook but when converted to markdown you just see the HTML table tags. If you use the print() statement then the dataframe output will print but without the additional styling.

The DataFrame image package allows you to embed pandas DataFrames as images in pdf and markdown files when converting from Jupyter Notebooks. It will also convert any plots in the notebook to image format.

  • pip install dataframe_image to install the package. Once the package is installed, the option DataFrame as Image (PDF or Markdown) will appear in the menu File -> Download as. If you select this option a form will pop up where you select PDF, Markdown or Both, click download button and a zip file will appear in the downloads folder which will contain the new converted document in markdown or pdf as well as a folder containing the images. The folder will have the same name as the notebook with _files appended to it. This folder will contain images of the dataframe tables and any plot images.

You can also export individual dataframes inside the .ipynb notebook.

import dataframe_image as dfi to import the package. then dfi.export() command with the name of the dataframe and the output file.

You can use the dfi.export function to save normal and styled DataFrames as png image files. The documents provide an example of this.

I have an example in the notebooks section of this blog.


Convert Jupyter notebook to markdown, turning off code input cell content.

jupyter nbconvert index.ipynb --to markdown --TemplateExporter.exclude_input=True --NbConvertApp.output_files_dir=. This is what I am currently using for converting jupyter notebooks with the code cells turned off to shorten the notebook.


Notes

  • I need to look at creating scripts for doing this rather than converting each notebook on the command line and then having to change the links to the images to one folder up.

  • I could just use the dataframe_images package to download the notebook in markdown and then move the converted notebook and output into the relevant section of the blog.

  • The markdown document will need to have front matter added to it to appear as a clickable link in the list of posts or notebooks.


Stack Overflow

  • jupyter nbconvert YourNotebook.ipynb --no-input --to html
  • jupyter nbconvert Irish_weather_stations_yesterday.ipynb --no-input --no-prompt (to align cells to the right)

References