A backend backed blog is great because it enables a lot of features. Typically my previous blog was a TomEE backed implementation which enabled to have great features such as scheduled tweeter post, dynamic updates and so on. However, it has several pitfalls like the fact it must be maintained somehow, upgrades must also follow security vulnerabilities (which can happen when totally undesired) and so on. Let see why I decided to move the technical stack of this blog and the few challenges I faced and how I solved it.

Technology stack maintenance and context

When the blog was originally developed, 5 years ago now, I was very involved in Apache TomEE, Angular "NG" was a bit new but very relevant for a full stack developer and quite simple so the choice was quite easy.

If we compare these choices to today world:

  • I slowly moved from Apache TomEE to Apache Meecrowave or even plain Apache OpenWebBeans/CDI SE API (with an embedded HTTP server),

  • Angular >= 2 became a full industry and is quite slow (to compile/transpile/build but also to run, even with ivy) compared to its concurrent as React or Vue.

So if I would choice the same technical solution as of today, I would start from a CDI SE application and probably use React.

However, in the mean time, static generation became more popular, Github opened private repositories for everyone (which enables to save content without publishing it) etc…​

Reviewing the way I was managing the documentation for most of the Apache projects I’m involved in, I realized most of them use a custom documentation generator - understand a java script like generator. At the same time, I developed for the company I work currently for (Yupiik) a similar tool in the minisite maven plugin. Even if originally it was documentation oriented, this goal has some blogging capabilities.

So overall, it was time to evaluate the capacity to use a static generator for my blog.

Where was I starting from?

Each time a migration occurs you need to check what you accept to loose (or not) and what you gain.

For this blog, it was almost ok to loose all backend features - I can schedule a tweet in a lot of free apps today - but I was not ready to loose the post URL I had before.

In terms of gain, it is obvious I was now able to write content in asciidoc - and no more through a UI in the blog itself, and have a backup as .adoc files and not a database dump.

To start the migration I needed to move from the database content to the minisite layout. The starting point was to export all the content in .adoc files. Luckily I had on my prevous blog a "dump content as JSON" button. Output was looking like:

{
  "date": "2021-03-22T05:45:56.317",
  "categories": [ (2)
    {
      "id": 51,
      "name": "Java/EE/Microprofile",
      "slug": "javaee",
      "color": "#fe005e"
    },
    ...
  ],
  "posts": [ (1)
    {
      "id": 151,
      "title": "Bye bye wordpress.com, Hello RBlog!",
      "author": "rmannibucau",
      "slug": "bye-bye-wordpress",
      "type": "POST",
      "content": "<p>Blogging is probably the easiest way to share some tips and experiences. ...",
      "summary": "I started blogging in 2012 on wordpress.com ...",
      "published": "2016-05-26 23:49:00",
      "created": "2016-05-26 23:49:00",
      "updated": "2016-05-26 23:49:00",
      "categories": { (2)
        "id": 53,
        "name": "Other"
      }
    },
    ...
  ],
  ...
}

The interesting parts are:

  • Post content is exported (sadly as HTML),

  • Categories are available in the export and can be joined on in the post data to find their metadata.

Migration

Migrate the content

The first step to migrate was to convert this dump in adoc files.

To do that I first need to read the JSON dump. To do it I used JSON-B (Apache Johnzon more exactly) and Java 16 records:

public record Category(long id, String name, String slug, String color) {}

public record Post(long id, String title, String author, String slug, String type, String content, String summary,
                   String published, String created, String update, Category categories) {}

public record Dump(String date, List<Category> categories, List<Post> posts) {}

From that model, ready the dump is as simple as - indeed I downloaded it before, no need to automate it from REST API for a one shot migration:

private Dump readDump() {
    try (final var jsonb = newJsonb();
         final var reader = Files.newBufferedReader(Paths.get("backup.json"))) {
        return  jsonb.fromJson(reader, Dump.class);
    }
}

Once dump is read, posts are available through dump.posts(). Idea is to create a folder src/main/minisite/content and for each post put a file from post item:

private void migrate(final Post post, final AtomicInteger remaining, final long total, final Path content,
                                    final Map<Long, String> categories) {
    final var out = content.resolve(post.slug() + ".adoc"); (1)
    try {
        Files.writeString(out,
                "= " + post.title() + "\n" + (2)
                        (3)
                        ":minisite-blog-published-date: " + post.published().substring(0, post.published().indexOf(' ')) + "\n" +
                        ":minisite-blog-categories: " + categories.get(post.categories().id()) + "\n" +
                        ":minisite-blog-authors: Romain Manni-Bucau\n" +
                        ":minisite-blog-summary: " + post.summary().replace("\n", " ").trim() + "\n" +
                        "\n" +
                        (4)
                        "++++\n" +
                        post.content() + "\n" +
                        "++++\n");
    } catch (final IOException e) {
        throw new IllegalStateException(e);
    }
}
1 Each post has a slug which is the url text marker for the post so simply use that as file name,
2 The post title is a simple 1-1 binding for the content,
3 The minisite meta enables to generate the item on listing pages (per category, per author, all posts sorted per date),
4 As a first step the HTML content (from the dump) is injected in the asciidoc file as this (since blog will be rendered with html5 backend it is sufficient),
in practise this method is wrapped in a loop and even a multi-threaded impl but this is not the core of the migration so I skip it.

Setup the minisite/blog generator

At that stage we can enable minisite mojo in the project pom:

<plugin> (1)
  <groupId>io.yupiik.maven</groupId>
  <artifactId>yupiik-tools-maven-plugin</artifactId>
  <version>1.0.9</version>
  <executions>
    <execution> (2)
      <id>generate-site</id>
      <phase>process-classes</phase>
      <goals>
        <goal>minisite</goal>
      </goals>
    </execution>
  </executions>
  <configuration>
    <siteBase>/</siteBase> (3)

    (4)
    <logoText>RManniBucau</logoText>
    <logoSideText>Blog</logoSideText>
    <copyright>RManniBucau &amp;copy;</copyright>
    <logo>/images/logo.png</logo>
    <linkedInCompany>rmannibucau</linkedInCompany>
    <indexText>RManniBucau Blog</indexText>
    <indexSubTitle>An opiniated IT blogging.</indexSubTitle>
    <skipIndexTitleDocumentationText>true</skipIndexTitleDocumentationText>

    (5)
    <customHead>&lt;link rel=&quot;stylesheet&quot; href=&quot;/css/blog.css?v=${project.version}&quot;&gt;</customHead>

    (6)
    <blogPageSize>5</blogPageSize>
    <blogCategoriesCustomizations>
      <JavaEEMicroprofile>
        <order>1</order>
        <icon>fab fa-java</icon>
        <description>Java related posts.</description>
      </JavaEEMicroprofile>
      <Frontend>
        <order>2</order>
        <icon>fab fa-js</icon>
        <description>Frontend development.</description>
      </Frontend>
      <BigData>
        <order>3</order>
        <icon>fa fa-database</icon>
        <description>Big Data technology.</description>
      </BigData>
      <Other>
        <order>4</order>
        <icon>fa fa-random</icon>
        <description>All other posts.</description>
      </Other>
    </blogCategoriesCustomizations>
  </configuration>
</plugin>
1 Define the plugin which will enable to generate the blog,
2 Define the actual execution of minisite mojo to generate the blog on process-classes phase (optional, can be replaced with mvn yupiik-tools:serve-minisite -e command),
3 Force the base of the site (for assets) to be the root context,
4 Customize default theme texts (home page mainly),
5 Inject a custom CSS in default theme (will sit in src/main/minisite/assets/css/blog.css),
6 Customize the blog generation, here by reducing the number of post per page (pagination) and categories (sorting them, customizing them icons and description).

If I run the minisite dev server: mvn yupiik-tools:serve-minisite, I can check the blog at http://localhost:4200.

Handle redirects

There are multiple options to handle the redirections.

One first idea can be to generate the exact same file layout (src/main/minisite/content/post/<slug>) and inject in the "old" file the generated html by the adoc→html convertion. This is not a bad idea but not all servers will serve the html without the extension as a html file which can create some issues.

At the end, my server will stay a Tomcat (but upgraded to a v10) instance so I can use Tomcat features. You maybe thought about a servlet filter, it would work perfectly but actually there is a way to not code it at all: RewriteValve. This valve (a kind of tomcat lower level filter) is close to HTTPd mod_rewrite and enables to work on the tomcat Request object before it hits the servlet chain. In other words: it can rewrite the incoming request to fake it is another one when entering the servlet chain, the DefaultServlet in our case which will serve the static html files.

For example, my last post was /post/apache-meecrowave-access-log and after the migration it will be /apache-meecrowave-access-log.html so I want to rewrite the request path from the first pattern to the second one.

The first step to handle these mapping was to extract the mapping between the new and old post. Since we iterate over all the posts we can indeed store all mappings but the rule is actually simpler: if it starts with /post/, capture what is after this prefix, drop this prefix and append .html. In terms of rewrite rule it can be written as this:

RewriteRule ^/post/([^/]+)$ /$1.html [L]

If we split this line in segments we define a rewrite rule (rewrite the request), match a request on /post/<slug> extracting the <slug> part, replace this path by /<slug>.html and finally quit the rewrite evaluation ([L] - optional since we have a single rule ;)).

Now I put this file in a war in WEB-INF/rewrite.config and define a META-INF/context.xml containing the rewrite valve declaration:

<?xml version="1.0" encoding="utf-8"?>
<Context>
  <Valve className="org.apache.catalina.valves.rewrite.RewriteValve" />
</Context>

And now we are good.

Packaging

In terms of packaging it is mainly a matter of putting the generated website in an exploded war called ROOT to serve root context (/).

What is interesting to note in this phase is that:

  • using an folder instead of an archive enables a better diff process to update the server (whatever technology you use),

  • once the first migration is uploaded you only upload .html (and optionally a few assets) so the webapp can be generated once only and even manually - even if maven-war-plugin works well if you prefer.

Updates

Now the website is functional and deployed but the last question is how to update/add content the new blog? Before it was mainly about connecting to the admin layer, writing the content and saving it.

Now the process is:

  • Go in the blog project,

  • Write the content in src/main/minisite/content/<my-new-blog-slug>.adoc - without forgetting the minisite blog metadata,

  • Optionally check out the result using mvn yupiik-tools:serve-minisite,

  • Upload the new files.

The very interesting point is that all this process but last step is offline compared to before which is a big win plus the content is written in a wiki format and not HTML directly.

About the upload process I tend to have a FTP access to the server (thanks Metawerx :)) so I first looked at lftp or alike commands to have a kind of rsync between my local folder and the server but the process was not that accurate for this flow so I ended up writting a main I can run when i want to update the blog.

The logic is:

  1. List all FTP files in the ROOT webapps (almost only the posts and index pages),

  2. List all blog files in the minisite output folder.

When both are the same it means there is nothing to update, when both are different there is a diff.

The synchronization main implements a simple "mostly work" logic:

  1. all missing file on the remote server must be uploaded,

  2. all files which changes must be uploaded (here we rely on the size which is not perfect but generally works),

  3. never delete any file to prevent errors (this can still be done manually when really needed but it is not that common for a blog).

In terms of code I rely on commons-net and component-runtime-junit library. The first one gives me a FTP client and last one a way to read my FTP server password from my maven servers (ciphered):

<dependency>
  <groupId>commons-net</groupId>
  <artifactId>commons-net</artifactId>
  <version>3.8.0</version>
  <scope>provided</scope>
</dependency>
<dependency>
  <groupId>org.talend.sdk.component</groupId>
  <artifactId>component-runtime-junit</artifactId>
  <version>1.31.0</version>
  <scope>provided</scope>
  <exclusions>
    <exclusion> <!-- no need of transitive deps -->
      <groupId>*</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>
I use provided scope to ensure it is not bundled in the war when using maven-war-plugin. Similarly the plugin excludes target/classes from being bundled since it is only generation code.

The next step is to create a FTP client pool class to be able to execute this process a bit concurrently (this is optional if you don’t use concurrency because your FTP server throttles accesses):

import org.apache.commons.net.ftp.FTPClient;
import org.talend.sdk.component.maven.Server;

import java.io.IOException;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.function.Function;

public class FTPClientPool implements AutoCloseable {
    private final BlockingQueue<FTPClient> clients;

    public FTPClientPool(final int count, final Server credentials, final String host, final int port) throws IOException {
        try {
            clients = new ArrayBlockingQueue<>(count);
            for (int i = 0; i < count; i++) {
                final var client = new FTPClient();
                client.connect(host, port);
                if (!client.login(credentials.getUsername(), credentials.getPassword())) {
                    throw new IllegalArgumentException("Can't log in");
                }
                clients.add(client);
            }
        } catch (final RuntimeException | IOException re) {
            close();
            throw re;
        }
    }

    public <T> T withClient(final Function<FTPClient, T> fn) throws InterruptedException {
        final var client = clients.take();
        try {
            return fn.apply(client);
        } finally {
            clients.add(client);
        }
    }

    @Override
    public void close() {
        clients.forEach(it -> {
            try {
                it.disconnect();
            } catch (final IOException e) {
                // no-op
            }
        });
        clients.clear();
    }
}

This class has no real trick, it just creates a queue of ftp clients and when calling withClient provides a client to the caller as soon as possible.

To make a diff between local and remote file we’ll use a record (or a POJO if you can’t use records) just comparing the file path (relative) and size:

record FileSpec(String name, long size) {}

The main is now as simple as:

public final class SynchronizeFTP {
    public static void main(final String... args) throws Exception {
        final var credentials = new MavenDecrypter().find("blog-ftp"); (1)
        final var pool = Executors.newWorkStealingPool(); (2)
        try (var ftp = new FTPClientPool(Integer.get("ftp.concurrency", 1), credentials, "xxxx.net", 21)) { (3)
            (4)
            final var remoteFiles = listFiles(pool, ftp, "/webapps/ROOT")
                    .toCompletableFuture()
                    .get();

            (5)
            final var sourceFolder = Paths.get("target/rmannibucau-blog");
            final var localFiles = listFiles(sourceFolder);

            (6)
            final var diff = new ArrayList<>(localFiles);
            diff.removeAll(remoteFiles);

            if (!diff.isEmpty()) {
                ftp.withClient(client -> {
                    diff.forEach(file -> { (7)
                        final var from = sourceFolder.resolve(file.name());
                        try (final var stream = Files.newInputStream(from)) {
                            client.storeFile("/webapps/ROOT/" + file.name(), stream);
                        } catch (final IOException e) {
                            throw new IllegalStateException(e);
                        }
                    });
                    return null;
                });
            }
        } finally {
            pool.shutdownNow();
        }
    }
}
1 we fetch from ~/.m2/settings.xml the credentials to use to connect to the ftp server,
2 we create a thread pool to enable to executes concurrently some tasks (optional),
3 we list files from the remote FTP but only in the "webapp"/blog folder (we ignore server files) - so we use /webapps/ROOT here,
4 we do the same locally in the minisite folder (assumes mvn yupiik-tools:minisite was executed before),
5 we diff both file sets based on the previous logic, here it is as simple as removing from the local files the ones already on the server (thanks records or POJO equals/hashCode to make this code that simple;)),
6 if there is any file to upload, we grab a ftp client from our pool and upload it from the minisite output folder.
real code has a bit more logs to know what is happening.

Listing local files is easy with Java NIO File API since it is mainly using Files.walkFileTree:

private static List<FileSpec> listFiles(final Path path) throws IOException {
    final var files = new ArrayList<FileSpec>(256);
    Files.walkFileTree(path, new SimpleFileVisitor<>() {
        @Override
        public FileVisitResult visitFile(final Path file, final BasicFileAttributes attrs) throws IOException {
            final var filename = file.getFileName().toString();
            (1)
            if (("war-tracker".equals(filename) || "MANIFEST.MF".equals(filename)) &&
                    file.getParent() != null && "META-INF".equals(file.getParent().getFileName().toString())) {
                return FileVisitResult.SKIP_SUBTREE;
            }
            files.add(new FileSpec(path.relativize(file).toString().replace(File.separatorChar, '/'), Files.size(file)));
            return super.visitFile(file, attrs);
        }

        @Override
        public FileVisitResult preVisitDirectory(final Path dir, final BasicFileAttributes attrs) throws IOException {
            (1)
            if ("maven".equals(dir.getFileName().toString()) &&
                    dir.getParent() != null && "META-INF".equals(dir.getParent().getFileName().toString())) {
                return FileVisitResult.SKIP_SUBTREE;
            }
            return super.preVisitDirectory(dir, attrs);
        }
    });
    return files;
}
1 the only trick this code has is to ignore not blog related resource when using maven-war-plugin (manifest is useless here, maven folder too and war-tracker is a tomcat marker file). All these files are not generated if you don’t use any default tooling - or it can be configured to be ignored - but it is easier to not prevent to use any tooling and integrate this exclusion list.

To list the FTP file code is similar but does not have a walk utility so we use a standard recursive logic:

private static CompletionStage<Collection<FileSpec>> listFiles(final ExecutorService pool, final FTPClientPool ftp,
                                                               final String dir) throws IOException {
    final var files = new ArrayList<FileSpec>(256);
    final FTPFile[] listFiles;
    try {
        listFiles = ftp.withClient(client -> { (1)
            try {
                return client.listFiles(dir);
            } catch (final IOException e) {
                throw new IllegalStateException(e);
            }
        });
    } catch (final InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new IllegalStateException(e);
    }
    final var promise = new CompletableFuture<Collection<FileSpec>>();
    try {
        pool.submit(() -> { (2)
            var facade = CompletableFuture.<Void>completedFuture(null); (3)
            try {
                for (final FTPFile ftpFile : listFiles) {
                    (4)
                    if (("maven".equals(ftpFile.getName()) || "war-tracker".equals(ftpFile.getName()) || "MANIFEST.MF".equals(ftpFile.getName())) &&
                            dir.endsWith("META-INF")) { // ignore these ones
                        continue;
                    }
                    (5)
                    if (ftpFile.isFile()) {
                        final var remoteFile = new FileSpec(ftpFile.getName(), ftpFile.getSize());
                        synchronized (files) {
                            files.add(remoteFile);
                        }
                    } else if (ftpFile.isDirectory()) {
                        facade = facade (6)
                                .thenCompose(ignored -> {
                                    try {
                                        return listFiles(pool, ftp, dir + '/' + ftpFile.getName());
                                    } catch (final IOException e) {
                                        throw new IllegalStateException(e);
                                    }
                                })
                                .thenApply(list -> list.stream()
                                        .map(f -> new FileSpec(ftpFile.getName() + '/' + f.name(), f.size()))
                                        .collect(toList()))
                                .thenAccept(list -> {
                                    synchronized (files) {
                                        files.addAll(list);
                                    }
                                });
                    }
                }
                facade.whenComplete((r, e) -> { (7)
                    if (e != null) {
                        promise.completeExceptionally(e);
                    } else {
                        promise.complete(files);
                    }
                });
            } catch (final RuntimeException re) {
                promise.completeExceptionally(re);
            }
        });
    } catch (final RuntimeException re) { (8)
        promise.completeExceptionally(re);
    }
    return promise;
}
1 we start by listing children of a starting folder (we start from /webapps/ROOT),
2 we will process the output in a thread pool (once again it depends if you FTP server supports it or not) to try to make it faster,
3 our method will return a CompletionStage (promise) but we need to return only when nested folders are also visited so we create another promise which will stack nested folder visits to resolve the returned promise when all nested folders are processed (reactively),
4 we exclude the same folders than the one we had for local directory,
5 if the ftp file is a regular file we just append it to our result list,
6 if the ftp file is a folder we launch a recursive processing and stack it on our promise facade to be notified when it is done,
7 we are waiting for our facade completion (with an error or success) to complete the resulting promise (promise) and return to the caller the result,
8 it is important to always handle synchronous error case with promises otherwise you can get callers awaiting indefinitively, since a thread pool submission can fail in some cases we must catch exceptions there too.

Now, when I want to update my blog I can just run that main and I have exactly which files are changed and uploaded (diff is logged in practise).

Test it in Tomcat

To test in tomcat - a real tomcat not an embedded flavor - the easiest I found was to use Apache TomEE Maven plugin…​yes TomEE plugin supports a vanilla Tomcat ;).

I used maven-war-plugin which enables to easily bundle the generated site excluding project classes (generation/migration code) and dependencies (by using provided scope).

Then it is just a matter of configuring tomcat as distribution in TomEE plugin and to force the context to be ROOT and you can start the blog with mvn [package] tomee:run:

<plugins>
  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <version>3.8.1</version>
    <configuration>
      <source>16</source>
      <target>16</target>
      <release>16</release>
    </configuration>
  </plugin>
  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-war-plugin</artifactId>
    <version>3.3.1</version>
    <configuration>
      <failOnMissingWebXml>false</failOnMissingWebXml>
      <!-- ignore classes since it is a fully static website -->
      <classesDirectory>${project.build.directory}/skip-classes</classesDirectory>
    </configuration>
  </plugin>
  <plugin> <!-- to test or prepare an instance (version upgrade) -->
    <groupId>org.apache.tomee.maven</groupId>
    <artifactId>tomee-maven-plugin</artifactId>
    <version>8.0.5</version>
    <configuration>
      <context>ROOT</context>
      <removeDefaultWebapps>true</removeDefaultWebapps>

      <tomeeGroupId>org.apache.tomcat</tomeeGroupId>
      <tomeeArtifactId>tomcat</tomeeArtifactId>
      <tomeeVersion>10.0.4</tomeeVersion>
      <tomeeClassifier>ignore</tomeeClassifier>
    </configuration>
  </plugin>
  <plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>exec-maven-plugin</artifactId>
    <version>3.0.0</version>
    <executions>
      <execution>
        <id>sync</id>
        <phase>none</phase>
        <goals>
          <goal>java</goal>
        </goals>
        <configuration>
          <mainClass>com.github.rmannibucau.blog.migration.ftp.SynchronizeFTP</mainClass>
          <classpathScope>compile</classpathScope>
          <includeProjectDependencies>true</includeProjectDependencies>
          <stopUnresponsiveDaemonThreads>true</stopUnresponsiveDaemonThreads>
        </configuration>
      </execution>
    </executions>
  </plugin>
  <plugin>
    <groupId>io.yupiik.maven</groupId>
    <artifactId>yupiik-tools-maven-plugin</artifactId>
    <version>1.0.9-SNAPSHOT</version>
    <executions>
      <execution>
        <id>generate-site</id>
        <phase>process-classes</phase>
        <goals>
          <goal>minisite</goal>
        </goals>
      </execution>
    </executions>
    <configuration>
      <!-- same as before -->
    </configuration>
  </plugin>
</plugins>

Conclusion

Indeed the natural temptation is to reuse as much as possible and reduce the cost but for something as personal and custom as a blog and since now Java can almost be used as a scripting language (even if not interpreted), it is also important to ensure you use a tool adapted to you and not mix a ton of tools to have something almost ok. Maintenance is key and being static enables an easier tuning and future.

More the tooling is complex more it has chances to fail. The tooling is really bare minimum now for my blog, I don’t rely on a javascript frontend, on advanced WYSIWYG widget to have something human friendly etc…​just two Java piece of code I can easily tweak and very few dependencies (actually only asciidoctor for the generation as important dependency and commons-net for the upload) so risk is way lower than it was before where angular, typescript, CKEditor, TomEE, CXF, …​.. were on the path. This is also true for the runtime, no more database, only files, "by construction" backup, easier output (I could now output PDF if I want whereas before it would have been complicated) and faster site loading (even if it had some optimizations before already).

I also note the mix with Tomcat to handle the URL migration is very neat and avoids to require a third party tooling (another proxy or custom filter).

From the same author:

In the same category: