<--- Notes

Owning my Content

Textual content is everywhere: websites, notes, books, emails, presentations, and developer documentation. It's just presented differently.

It all has a similar structure: Headers, sub-headers, paragraphs, bolded and italicized words, lists, tables, images, and links. The tool you use just may be different.

Word, PDF, HTML, Apple Notes, why is it so hard to convert between them? Why can't it just be plain text?

Markdown and the unified collective is the answer.

What's Markdown?

Markdown is a universal language that lets you write plain text documents that contain everything that a Word document does and more.

Instead of using buttons on a property platform, or writing HTML code yourself to add headers and such, you just add a character or two, like a '#' to signify the text is a header. # Header ## Subheader

You then take your plain text document anywhere that supports Markdown (a lot of places these days) and they can render it to look how you want (book, word document, website, etc).

Markdown Flowchart

What's the Unified Collective?

Markdown language and syntax need to be transformed into your final desired format (HTML, PDF, etc), and even vice-versa (HTML to Markdown). The Unified Collective builds the tools to do this.

Transforming content has 3 main steps:

  1. Parsing: Content needs to be turned into a "syntax tree" as specified by unist. Unist doesn't exactly specify what a syntax tree is for different content forms like Markdown/HTML input. It just defines a universal abstract syntax tree and expects projects to extend unist for specific languages. Here are popular specifications for different languages:

    • mdast - Specifies Markdown abstract syntax trees.
    • hast - Specifies HTML abstract syntax trees.
    • xast - Specifies XML abstract syntax trees.
    • esast - Specifies JavaScript abstract syntax trees.

    The projects above still only specify abstract syntax trees. To get a concrete syntax tree there's another collection of projects that do the actual parsing:

  2. Transformation: The syntax trees (usually JSON format) can then be transformed into whatever you want by the ecosystem of plugins that exist for the concrete syntax trees.

    For example, the remark-pdf plugin can turn markdown files into PDFs. If a plugin you want doesn't exist, you can create your own. It's easy(ish) since you're working with a concrete syntax tree.

    While some plugins can completely change your concrete tree type, like the remark-rehype plugin which turns mdast (markdown) trees into hast (HTML) trees, other plugins may just change the style by adding extra spaces, or checking for errors (linting) and outputting warning messages.

  3. Stringify: The transformed syntax tree is then stringified into its final form (HTML, markdown, prose/natural language). The output depends on the plugins you used.

An example: transforming markdown to HTML

import { unified } from "unified";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import rehypeSanitize from "rehype-sanitize";
import rehypeStringify from "rehype-stringify";

main();

async function main() {
  const file = await unified()
    .use(remarkParse) // specify the parser for the input file
    .use(remarkRehype) // turn the mdast tree into a hast tree
    .use(rehypeSanitize) // use a rehype plugin to ensure the HTML is safe
    .use(rehypeStringify) // stringify the hast tree into HTML
    .process("# Hello, Neptune!"); // do the 3 steps: parse, transform, and stringify using the tools above.

  console.log(String(file)); // --> <h1>Hello, Neptune!</h1>
}

Markdown's limitations

Standard CommonMark or GitHub Flavored Markdown is great for static text, links, and images, but it falls short on interactive content like charts and widgets.

JavaScript and ReactJS libraries are modern ways to create dynamic and interactive websites. But standard Markdown language doesn't support these.

MDX extends the markdown syntax with JSX syntax. MDX files use the .mdx extension instead of the .md extension.

How to process an MDX file

MDX files can be processed using a long list of remark plugins, starting with remark-mdx which transforms the remark syntax tree (based on the mdast specification) into one that supports the MDX syntax.

You'd then need a lot of plugins afterward to parse the tree and stringify it into a JSX Fragment.

Thankfully, the mdx-js project built a collection of packages to help you process an MDX file into React Fragments.

MDX Processors

There are many MDX processers, but at the root of all of them is the @mdx-js/mdx compiler which uses a ton of remark plugins to turn MDX into a JSX Fragment. See the mdx architecture for how this works.

However, if you're using a bundler (see my other note on what that is) like webpack, Rollup, or esbuild, then you'll want to use something like the @mdx-js/loader package. Adding this package into your webpack config will then enable you to import a .mdx file and get a React fragment returned.

import Article from "./article.mdx";
export default function Page() {
  return <Article />;
}

But even more, if you're using a site builder like NextJS or Gatsby and you want to link directly to .mdx pages as you would for .html or .jsx pages, then you can use packages like @next/mdx or gatsby-plugin-mdx which further extend packages like @mdx-js/loader.

Assuming you have pages/about.mdx, in NextJS you can do something like:

import Link from "next/link";
export default function HomePage() {
  return <Link href="/about">About Page</Link>;
}

Processing MDX On Demand

Linking to MDX pages directly is great (as with @next/mdx or gatsby-plugin-mdx) but what if your .mdx file is in a remote repository, or if you don't know the page ahead of time because you're doing dynamic routing (importing files during runtime)? This was my issue, I had a NextJS project structure laid out like this:

├── content
|  └── notes
|     ├── about
|     └──  index.mdx
|     └──  photo.png
|     └──  component.js
├── pages
|  ├── index.js
|  ├── notes
|  |  ├── index.js
|  |  └── [slug].js
|  ├── _app.js
|  └── _document.js

Inside notes/[slug].js I was doing dynamic routing to get the MDX note based on the URL route:

import { getFileBySlug, getSlugs } from "@/lib/mdx";
import MdxLayout from "@/components/MdxLayout";

export async function getStaticPaths() {
  const slugs = getSlugs();
  return {
    paths: slugs.map((slug) => {
      params: {
        slug: slug;
      }
    }),
    fallback: false,
  };
}

export async function getStaticProps({ params }) {
  const slug = params.slug;
  let { content, metadata } = getFileBySlug(slug);
  return { props: { content, metadata } };
}

export default function SpecificNote({ content, metadata }) {
  const { title } = metadata;
  return (
    <MdxLayout>
      <h1>{title}</h1>
      // MDX content rendered somehow?????????????
    </MdxLayout>
  );
}

For rendering, I start with the content variable which is the raw text from the .mdx file. I need to convert this into a react fragment which contains HTML and JavaScript. Luckily, three popular projects can do this:

So which do I choose? Here's an overview of what each does:

  • @mdx-js/mdx
  • next-mdx-remote
      1. Serializes() the MDX by removing imports and exports from your MDX file and then calling compile() from @mdx-js/mdx which does steps 1-8 above and return a JSX Fragment as a string.
      1. Renders the MDX with its MDXRemote component that takes a few options:
      • source: the serialized MDX string
      • lazy: boolean, if true it returns an empty div while the React component is created from the source string.
      • components: lets you pass components for a specific MDX file you're rendering
    • This is essentially abstracts a few steps you'd have to do and figure out if you had used the @mdx-js/mdx directly, not much more.
  • mdx-bundler
      1. Creates a custom esbuild plugin using a variety of config options you give it.
      1. Then creates a config for esbuild using its plugin, the @mdx-js/esbuild plugin, and the extra esbuild config options you pass to it.
      1. Bundles your MDX file and returns a React component + frontmatter for you to use (b/c it includes the remark-frontmatter plugin by default).
    • The key value prop of this package over the two above is that it lets you use import/export inside your MDX file. Importing/exporting is code-splitting, which requires bundlers like webpack or esbuild to find the files and piece them together.
    • The downside is that you're adding another bundler (esbuild) to your project which makes this package a bit heavy.

The package I chose

I settled for mdx-bundler. Here's why:

  • To recap, I had a folder content/notes/[slug]/ with files: index.mdx, photo.png, ButtonComponent.js. The .mdx file is written in plain text but it's supposed to be rendered with the pictures and the JavaScript it imports.
  • I wanted to get the rendered package on the fly. Using the URL I'd match it to a "slug" and specific note.
  • Although dynamically importing files with, await import("content/notes/slug/index.mdx") and using the @next/mdx webpack loader returns a React component, it has to be done fully on the client-side because NextJS only allows static and serializable content to be passed as props to pages. Thus, I can't use the default MDX loaders.
  • The two packages: @mdx-js/mdx and next-mdx-remote let you compile .mdx files into a JavaScript that builds and by default exports a React component when you run it. Thus, while you can stringify this and pass it as a prop to pages, you can't import pictures or components inside your .mdx files because these packages aren't bundlers so they can't resolve imports/exports.
  • So, I'm left with mdx-bundler. Despite its large size, it does everything I need it to:
    • [x] I can store and use pictures and components right alongside the .mdx pages. My content can be anywhere.
    • [x] I can compile .mdx files during build time inside getStaticProps() (the most computationally expensive work) and then construct the React component quickly on the client-side during runtime.

The final result

Same file structure as described above, here's what I wrote:

import MdxLayout from "@/components/MdxLayout";
import { formatDate } from "@/lib/utils";
import { useMemo } from "react";
import path from "path";
import { bundleMDX } from "mdx-bundler";
import { getMDXComponent } from "mdx-bundler/client";
import remarkGfm from "remark-gfm";
import remarkMdxImages from "remark-mdx-images";

// at build time, generate a map of all possible paths
export async function getStaticPaths() {
  const slugs = getSlugs();
  return {
    paths: slugs.map((slug) => {
      return { params: { slug: slug } };
    }),
    fallback: false,
  };
}

// at build time, map every path with ready-to-construct mdx component
export async function getStaticProps({ params }) {
  const slug = params.slug; // URL keyword (eg. "about")
  const slugDir = path.join(process.cwd(), "content", "notes", slug); // content/notes/about
  const publicDir = path.join(process.cwd(), "public", "static"); // public/static

  let loaderImages = {}; // {".png":"file", ".jpg":"file"}
  [".png", ".jpg"].forEach((ext) => (loaderImages[ext] = "file"));

  const { code, frontmatter } = await bundleMDX({
    file: `${slugDir}/index.mdx`, // entry point for esbuild to start resolving imports/exports
    cwd: `${slugDir}/`, // relative path that imports/exports will be using, like "./photo.png"
    bundleDirectory: `${publicDir}/notes/${slug}/`, // bundle images into this public folder
    bundlePath: `/static/notes/${slug}/`, // load all images with the prefix public subfolder prefix
    esbuildOptions: (options) => {
      options.loader = {
        ...options.loader, // custom mdx-bundler and @mdx/esbuild loader
        ...loaderImages, // our images are interpreted as files
        ".js": "jsx", //  .js files should be treated as .jsx files so they can export components
      };
      options.target = [
        // bundle to older js syntax so that older browsers support it
        "es2020",
        "chrome58",
        "firefox57",
        "node12",
        "safari11",
      ];
      options.define = {
        "process.env": JSON.stringify(process.env), // share  environment variables with esbuild
      };
      return options;
    },
    mdxOptions(options) {
      // this is the recommended way to add custom remark/rehype plugins:
      // The syntax might look weird, but it protects you in case we add/remove
      // plugins in the future.
      options.remarkPlugins = [
        ...(options.remarkPlugins ?? []),
        remarkGfm,
        remarkMdxImages,
      ];
      options.rehypePlugins = [...(options.rehypePlugins ?? [])];

      return options;
    },
  });

  return { code, metadata: getMetadata(slug, frontmatter) };
}

// at runtime, construct the mdx component only when the code prop changes
export default function SpecificNote({ code, frontmatter }) {
  const { title, lastmod } = frontmatter;
  const Component = useMemo(() => getMDXComponent(code), [code]);
  return (
    <>
      <MdxLayout lastmod={formatDate(lastmod)}>
        <h1>{title}</h1>
        <Component />
      </MdxLayout>
    </>
  );
}

Conclusion

Learning about MDX and specifically how to process it on-demand forced me to dive in deep on how @mdx-js/mdx, next, and bundlers like esbuild and webpack work under the hood.

I learned about abstract syntax trees (great AST visualizer), how to build a simplified bundler, and came across new, to me, JavaScript syntax - new AsyncFunction(String(file))(options) - and JavaScript packages (string_decoder, Reflect) that I had to parse or ask the community.

Overall, it was a fun way to spend a week of my Christmas break.