Yeah, automating the tagging would be useful, but not at the top of my current concerns for the time being. My bigger blockers sit earlier in the pipeline. Here’s a brief rundown:
OpenAI Usage
Our company maintains strict policies regarding AI tool usage, which poses a problem for my manual migration plan which relies on a custom GPT project using o3. We do have an approved contract with FueliX for wide-ranging company AI use, but obtaining approval for dotAI tooling would be unlikely. However, if dotCMS provided me with the ability to add a request proxy field to my dotAI setup, I can use FueliX within the existing dotAI ecosystem. You continue to build dotCMS with the openai package like normal, but with the proxy set, Fuel iX essentially just hijacks the request and returns responses like a normal OpenAI API request/response. This implementation should be relatively straightforward and if you have any interest in exploring the option further, let me know. From the outside looking in, it definitely seems like something that would be quite easy to implement and I think it could be a big deal.
The other issue with my AI-assisted migration approach is that we’d have to get special approval to use OpenAI/ChatGPT since it would be handling proprietary articles (potentially containing company IP). The thing is, only with o3 have I been able to get an output that meets the requirements. I looked into using Fuel iX but it only allows me to create GPT projects using 4o. The Fuel iX API has o3-mini, but because ChatGPT and GPT Projects only allow you to use o3, I have no idea if developing something that uses o3-mini would even work. What a headache…
Summary
- Company policy restricts usage of dotAI/OpenAI.
- dotCMS would need to implement proxy field to allow for use of dotAI + FueliX.
- Manual migration requires o3 model capabilities but FueliX only offers 4o, or o3-mini via API.
Images
Our articles rely on diagrams and step-by-step graphics. Manually dragging images into place works, but I’m at a loss for a scalable, scripted approach that:
- uploads the image assets,
- rewrites their URLs, and
- preserves the original placement inside each article.
Summary
- Images must import automatically and appear in-line.
- Need a repeatable script for upload + URL rewrite.
- Manual fallback exists but doesn’t scale.
HTML Transformation
Our team was able to extract HTML/CSS/JS from Xyleme articles with really good output quality. We experimented with creating a JobAidLegacy model that stores raw HTML/JS/CSS in the content model’s body field, as a WYSIWYG field type, switched to code. The markup renders perfectly, matching what users are used to viewing on the previous system - but the moment an editor switches from Code to WYSIWYG, all functionality, style, and structure is lost and cannot be reversed. Because these articles are updated regularly, in effect, every update would require rebuilding the body from scratch, either in the legacy model way or migrating to the new JobAid with Block body, which defeats the purpose of automating the migration. In addition, it would drive the back end users crazy to try to navigate or extract content from a raw HTML code view of the content (WYSIWYG field switched to code).
Summary
- Current HTML dump renders but is unmaintainable.
- Switching WYSIWYG field mode strips critical JS/CSS and HTML structure.
- Need an approach that supports future edits without having to rebuild the item.
EN/FR Variants
In Xyleme, English and French variants are created as 2 separate documents, then linked to one another via hyperlink at the top of the article body. There are a few ways we could tackle this from either the programmatic or manual approach. We can’t assume that all documents have an EN/FR counterpart, so we’d need to figure out which items are single-language. Then compile a list of items to transfer, selecting only one language version for each article. Translate them with AI once we’ve brought them over to dotCMS. I know you can attach a Google Translate API key, but realistically, AI is just far better at language translations, especially for technical documents. Programmatically, it would be a lot of information to store and sort through, to upload both en and fr versions, and I haven’t even fully explored the possibility for us to use the API to upload either an English or French version, then update the item with the other language variable content. If you have insight into how this works, I’d be interested in hearing about it.
Summary
- Need to map which items are single-language vs. bilingual.
- Prefer AI translation post-import.
- Unsure how to attach the second language to an existing item via API.