Migrating Xyleme to dotCMS - looking for experienced guidance/advice

I’m looking for advice/guidance related to migrating from Xyleme to dotCMS. There isn’t much on the web related to this particular CMS switch, so I’m reaching out to try and find someone who maybe has some experience, insight, and advice for how to make this as smooth as possible.

If you or someone you know has made this switch, please let me know. I have questions…

1 Like

Good question here, and while I haven’t had specific experience with Xyleme (maybe someone else will be more helpful to you there), I definitely want to mention the top 2 general best practices for you and anyone else thinking about migrations.

  1. Defining your taxonomy, specifically around your content types (components) that are currently found in your instance of Xyleme. Content design and mapping is the foundation of ensuring a smooth and scalable migration, which leads me to the next thought…
  2. Content import: One of the most tedious, but critical, aspects of migration is just that…migrating content into the structures you’ve built up. Hopefully Xyleme has a good way to export content. If so, you can take those exports and map them to your content types in dotCMS, then use the content import API to get them quickly into the platform. If an export isn’t possible, we could maybe chat offline around some ideas of how to use AI to scrape and transform objects as needed.

Hoping others with some Xyleme experience will chime in with more specifics on that for you, but we’re looking forward to diving in deeper if you have any specific questions on this process for us!

We get a zip file that contains XML containing the body content and a small amount of taxon. It also included folders with icons and images contained in each document.

I’m starting to think creating GPT projects and manually migrating is going to be the best way. One to migrate, one to translate. I can give the model the list of our categories and tags, and have it consistently find the best taxon for the new content item.

My best guess on it is 2-10 minutes per article per person. BUT the only model good enough at handling all that context is o3, or maybe some of the Gemini models, and we have hundreds of documents to migrate. Unfortunately, I think this is the way…

Hi Ryan, thanks for sharing details here on your migration, it is an area we are heavily researching and developing especially with AI being able to improve the process. Hopefully we can get you to a process that avoids manual migration article by article!

I know you are working with Professional Services but I wanted to ask if you could share a sample of the zip file you get from Xyleme (only if it is generic info and there is nothing sensitive in there of course). I’d like to get a sense for what you are migrating in and what type of info Xyleme lets you export out.

Is one of the challenges that Xyleme does not let you export out all the taxonomy and other meta data? Or is pretty much everything there in the XML file and it is a matter of mapping it to what you have set up in dotCMS as the content structure?

thanks!

My current understanding is that they have nested custom items in the body of the document, like carousels and dropdown menus, which makes it more difficult to export to something that plays nice. I’ve had good success with transforming it using a custom GPT Project, giving my tags and categories as context files, and having it transformed to rich text that I can then paste into a Block body. The trick is, for 350+ articles with our first of 8 stakeholders, that’s over the 100 o3 prompt limits, so the $200/mo package is required, and it still might take me 3 days or so to get through them all.

Right now I’m looking at a way to script it. We have access to o3-mini with Fuel iX which might work for our needs. The biggest issue I’ll have with scripting it, though, is getting the images in the body to correctly map.

I’ve asked my project manager if they can find an article we don’t mind sharing and I’ll get back to you tomorrow if they give me the ok. But essentially, it looks like this:

I get an XML with Media_Enterphone Wiring zip. Once unzipped it looks like this:

XML with Media_Enterphone Wiring
├── Enterphone Wiring.xml
├── 9-Topics Archive
│ ├── Copper
│ │ └── enterphone wiring - Shaw digital terminal.PNG
│ └── Inside Wire
│ ├── Enterphone-2pair-gpon-graphic-122019.jpg
│ ├── Enterphone-intercom-bypass-graphic-122019.png
│ ├── Tii VIS-3 Unit_image_082021.PNG
│ └── … etc
└── 1-Boilerplate
└── Icons
└── Attention-purple@240.png

The XML has a bit of taxon in it which is helpful, but no direct mapping to the current tag and category structure we’ve created. The GPT does a decent job of extracting from the provided list and keeping it consistent, which is nice.

Anyway, what are your thoughts on this? I’d love to hear your insight. Thanks for reaching out.

That’s pretty cool approach but I can see how inserting the AI process in the middle of it. I’ll see if one of the engineers can look at this post too to see if any other ideas but one idea that may be interesting is to import the content as is, with values that do not have mapping to current tags and categories in their own fields, then once the content is in dotCMS, use dotAI (uses OpenAI api) workflow step to tag content and keep it restricted to existing tags. The issue will be with Categories though as this workflow step is not yet developed for AI.

It also sounds like your challenge with the export is more involved than just getting the right tags and categories assigned to the various content items, it is extracting the various content items from the XML itself to then import them. For that I am sure some scripting is needed regardless and would be good to bring that up with ProServ team that you are working with if you have not already.

This is a great use case for us though Ryan, thank you for bringing it up and going through your use of GPT to solve it. As we build out our MCP server and migration processes this is something we can use as a real live use case to help guide that.

Yeah, automating the tagging would be useful, but not at the top of my current concerns for the time being. My bigger blockers sit earlier in the pipeline. Here’s a brief rundown:

OpenAI Usage
Our company maintains strict policies regarding AI tool usage, which poses a problem for my manual migration plan which relies on a custom GPT project using o3. We do have an approved contract with FueliX for wide-ranging company AI use, but obtaining approval for dotAI tooling would be unlikely. However, if dotCMS provided me with the ability to add a request proxy field to my dotAI setup, I can use FueliX within the existing dotAI ecosystem. You continue to build dotCMS with the openai package like normal, but with the proxy set, Fuel iX essentially just hijacks the request and returns responses like a normal OpenAI API request/response. This implementation should be relatively straightforward and if you have any interest in exploring the option further, let me know. From the outside looking in, it definitely seems like something that would be quite easy to implement and I think it could be a big deal.

The other issue with my AI-assisted migration approach is that we’d have to get special approval to use OpenAI/ChatGPT since it would be handling proprietary articles (potentially containing company IP). The thing is, only with o3 have I been able to get an output that meets the requirements. I looked into using Fuel iX but it only allows me to create GPT projects using 4o. The Fuel iX API has o3-mini, but because ChatGPT and GPT Projects only allow you to use o3, I have no idea if developing something that uses o3-mini would even work. What a headache…

Summary

  • Company policy restricts usage of dotAI/OpenAI.
  • dotCMS would need to implement proxy field to allow for use of dotAI + FueliX.
  • Manual migration requires o3 model capabilities but FueliX only offers 4o, or o3-mini via API.

Images
Our articles rely on diagrams and step-by-step graphics. Manually dragging images into place works, but I’m at a loss for a scalable, scripted approach that:

  • uploads the image assets,
  • rewrites their URLs, and
  • preserves the original placement inside each article.

Summary

  • Images must import automatically and appear in-line.
  • Need a repeatable script for upload + URL rewrite.
  • Manual fallback exists but doesn’t scale.

HTML Transformation
Our team was able to extract HTML/CSS/JS from Xyleme articles with really good output quality. We experimented with creating a JobAidLegacy model that stores raw HTML/JS/CSS in the content model’s body field, as a WYSIWYG field type, switched to code. The markup renders perfectly, matching what users are used to viewing on the previous system - but the moment an editor switches from Code to WYSIWYG, all functionality, style, and structure is lost and cannot be reversed. Because these articles are updated regularly, in effect, every update would require rebuilding the body from scratch, either in the legacy model way or migrating to the new JobAid with Block body, which defeats the purpose of automating the migration. In addition, it would drive the back end users crazy to try to navigate or extract content from a raw HTML code view of the content (WYSIWYG field switched to code).

Summary

  • Current HTML dump renders but is unmaintainable.
  • Switching WYSIWYG field mode strips critical JS/CSS and HTML structure.
  • Need an approach that supports future edits without having to rebuild the item.

EN/FR Variants

In Xyleme, English and French variants are created as 2 separate documents, then linked to one another via hyperlink at the top of the article body. There are a few ways we could tackle this from either the programmatic or manual approach. We can’t assume that all documents have an EN/FR counterpart, so we’d need to figure out which items are single-language. Then compile a list of items to transfer, selecting only one language version for each article. Translate them with AI once we’ve brought them over to dotCMS. I know you can attach a Google Translate API key, but realistically, AI is just far better at language translations, especially for technical documents. Programmatically, it would be a lot of information to store and sort through, to upload both en and fr versions, and I haven’t even fully explored the possibility for us to use the API to upload either an English or French version, then update the item with the other language variable content. If you have insight into how this works, I’d be interested in hearing about it.

Summary

  • Need to map which items are single-language vs. bilingual.
  • Prefer AI translation post-import.
  • Unsure how to attach the second language to an existing item via API.

Thanks so much for the breakdown Ryan! I’ll make sure the ProServ team is aware of all the info here so they can help work out solutions and I’ll pass to the engineering team too as some of these will no doubt have to be product updates.

A couple of initial thoughts though:

  • On OpenAI usage we are working right now on updating our integration framework so you can use any LLM provider and model. I do not know if this will solve being able to use a service like FueliX, I will check with engineering and see if that is solved with the update we are working on.
  • For Articles with Images, not sure if you tried using the Block Editor instead of WYSIWYG field type but that could be something ProServ can help look at. I’m not sure if migration into Block Editor can be done to preserve the integrity of the article with embedded images but will have to see. I’ll also get this specific thing to Engineering to see why users cannot switch gracefully from code to WYSIWYG.
  • On the Language variants we do have an API endpoint for language versions so you should be able to load in the French versions. Also, we have a plugin available that uses the OpenAI integration (with your key) to do translation via a built in workflow, so that may be an option. We have not built this plugin into Core yet but we can get you instructions to use it if you think that would be an option (given your constraints on using OpenAI directly).

Either way, will get this all to ProServ so they have this info as you work with them. Thanks again!

It’s probably the standard ones like Gemini, Anthropic, OpenAI, etc. but what we’d actually need in order to use it, is the proxy field for Fuel iX. Just in case this is something your team looks at, later - the API key is also shorter than the standard OpenAI key, so field validation in that regard would also need to be flexible if using a proxy url is enabled.

Absolutely. The block editor works great if we are AI transforming and then manually uploading, but for programmatic upload I haven’t found a solid solution. If you know of something that will work, please let me know. What format do we use to insert the text in a way that will convert to appropriate blocks (markdown?)? How do we insert images intermittently in the document? etc. The biggest thing would be the images. There’s no point in creating a script to migrate if we still have to go through and manually insert images anyways.

Can you link me to information about that plugin? Again, we could only use it if you implemented a proxy parameter feature so we could use it with Fuel iX but it’s worth taking a look at, in case we DO get permission to use OpenAI’s tooling for the project.

We’re running into some issues with the Block field type and transforming our XML or HTML Xyleme exports into something that it can upload as content. We’ve even run into a strange error that I’ll report to support shortly. We were able to upload a contentlet that doesn’t appear in our dotCMS GUI anywhere. It exists, and it’s even blocking the unique slug requirement for subsequent attempts of the same document, but you have to search via API to find it.

POST {{base_url}}/api/es/search

{
  "query": {
    "query_string": {
      "query": "urltitle:*authentification-unique-du-client-dpannage* OR slug:*authentification-unique-du-client-dpannage* OR urlmap:*authentification-unique-du-client-dpannage*"
    }
  },
  "size": 100
}

This returns a 500 status code and:

Unable to populate contentlet from table columns for ID='a49367a2c1237e62a4ca5eb11aa19cf8', Inode='80861f92-0607-4293-95cd-2d93ef088c9c', Content-Type 'eab965ab0e7cabbbd25364f34be3a754': An error occurred when refreshing Story Block Contentlet references in parent Content 'a49367a2c1237e62a4ca5eb11aa19cf8': Cannot construct instance of `java.util.LinkedHashMap` (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('{
    "type": "doc",
    "content": [
        {
            "type": "block",
            "containerid": "heading",
            "class": "heading-block",
            "content": {
                "type": "header",
                "attributes": {
                    "text": "Authentification unique du client – Dépannage",
                    "level": 2
                }
            }
        },
... etc

I know I have a test contentlet with the slug “cookies-are-great” so I did the same search for that slug and got:

{
    "contentlets": [
        {
            "dateReviewBy": "2026-12-31 08:00:00.0",
            "publishDate": "2025-07-16 15:38:15.579",
            "body_raw": "{\"type\":\"doc\",\"attrs\":{\"charCount\":2794,\"wordCount\":473,\"readingTime\":2},\"content\":[{\"type\":\"heading\",\"attrs\":{\"indent\":0,\"textAlign\":null,\"level\":1},\"content\":[{\"type\":\"text\",\"text\":\"Chocolate Chip Cookie Complete Guide\"}]},{\"type\":\"heading\",\"attrs\":{\"indent\":0,\"textAlign\":null,\"level\":1},\"content\":[{\"type\":\"text\",\"text\":\"History of the Chocolate Chip Cookie\"}]},
... etc

@MarcBoutillette - Together with your experience and the above, do you have any insight as to what we might be doing wrong in posting our content items?

Additionally, do you have any good documentation I can sift through to find out more on the correct way to handle uploading a Block body field value?

That looks like a bug that has been solved. What version is this running? If it is current, I would open a support ticket on this.

Hi Ryan! I’ll try to answer all these here, though for actual migration questions and issues it is better to work with your ProServ contact and if you run into issues with Support.

Regarding supporting FueliX, I am not 100% certain that all proxy services like that will be supported, we plan to use langchain so anything that is supported through that framework would work. We will have to see if layering in another LLM aggregation service like FueliX will take additional work to support.

Here is information about the plugin for translation. At this point it uses our OpenAI integration, so will not work with FueliX, but if you want to check it out you can access it from our GitHub: com.dotcms.ai.v2/README.md at main · dotCMS/com.dotcms.ai.v2 · GitHub

Good luck Ryan!