A number of years ago, I set out to automate a set of manual tests that we've been using for years to validate functionality and UI in Cartographica. I've been through a lot of technologies over the years, some expensive commercial tools, some open source technologies. I won't go through an exhaustive list of what we used and how we found them, but I will say that the last technology we used was Appium, a tool created mostly for mobile (macOS does benefit from its younger, more popular sibling occasionally) which models itself on Selenium and uses that popular web testing tool as the orchestration layer.
When I moved to using Appium, I'd already been doing something similar using and hand-modified version of Pyatom, a python-based automated testing framework for the Mac that uses the macOS Accessibility Framework to provide testing. I implemented some custom modifications for testing Cartographica, but the system was required that all tests be written manually in python. That's fine, I'm good with python, but there were no recording mechanisms or other tools to make the process easier.
After happening upon Appium when trying to automate testing for CartoMobile, I realized I could take advantage of the Selenium offshoot to get some much-needed tooling around my testing for Cartographica. This worked pretty well, although the recording tools never turned out to be a good shortcut, so I ended up writing all of my tests in Python manually anyway. Still, there were third-party tools, and integrations with popular testing and continuous integration frameworks, such as Jenkins.
In the end, I added about 25 tests for Cartographica using Appium before the relative brittleness got to me. In addition to being a bit difficult to manage on the Mac, I ran into some significant timing problems and issues with item occlusion when doing drag testing. Although that wasn't the end of the world, it was a large time sync and the tests themselves, even when hardened had a failure rate of 3-8% per run, which meant they weren't so much a gate as a hurdle. My last check-in notes when I was still hoping for better GUI test development were in 2014, and read "Try once again to get the drag-and-drop tests working under 10.10". They did not after many days, and I finally gave up and went back to manually running and validating the