Skip to Content
Technical Articles
Author's profile photo Patrick O'Brien

Breakdown of Apple’s Augmented Reality iOS frameworks and tools

Recently my colleague Kevin Muessig made an excellent introductory blog post over the newly released open-source repository SAP Fiori for iOS ARKit that my team released. As the lead developer I had to proficiently learn and understand a new domain which I’d like to share. There’s no better time than now to demystify the Augmented Reality tools for iOS and the concepts FioriARKit uses under the hood.

In the last few years Apple has had a flurry of releases in the AR space. Starting with ARKit and followed by RealityKit, Reality Composer, Reality Converter Beta, Lidar devices, and Object Capture. Whoa, that’s a lot to unpack.

What makes these frameworks/tools different and how do we use them together?

Reality Composer

Before jumping into code, a good starting point is thinking about scene creation. Creating a scene completely in code is absolutely possible yet poses significant challenges. You’ll have to dust off some linear algebra concepts and work with a 3D coordinate system which can be tedious, confusing, and time consuming. Reality Composer immensely simplifies this with a visual and interactive interface to populate a scene with virtual 3D models relative to a chosen anchor such as a horizontal plane. These 3D models can be given behaviors which have a trigger (e.g. tap) followed by a sequence of actions (e.g. movement, an animation, audio track, etc.). Then testable in an AR Preview.


Reality Composer Mac App

In the example above, I chose a plane anchor and placed a shoe and an airplane into the scene. I added two behaviors onto the airplane, to orbit the shoe and rotate to appear like the airplane is turning.

Reality Composer can be intuitive, but I do find that it can be tricky to compose complex scenes and interactions. For example, I attempted to create the Brick Breaker game, however the behavior was too complex. Setting the scene with Reality Composer and then handling the physics in code was a better approach. This highlights the tool as an extension and not a replacement for development.

Reality Converter Beta

Apple uses two formats for 3D assets, reality and usdz. The former is a proprietary format from Apple and the latter is an extension created by Apple from Pixar’s USD format. Since usdz is rather new a lot of 3D models are in other formats such as obj, stl, fbx. The formats vary on if/how they store textures and model animations. Reality Converter Beta is an offering to convert other 3D file types into usdz which can be a frustrating process with the current usdz command line tools.


Reality Converter Beta with shoe usdz file

It’s important to note that a single model such as the Shoe can be a reality or usdz file, but the entire orbit scene pictured above can also be exported as usdz or reality file as well. It conveniently includes the anchor data, scene hierarchy, animations, behaviors, textures, etc in a single file.

Object Capture

Modeling 3D assets is an expensive skill and a huge barrier for creating Augmented Reality experiences. This year at WWDC Apple announced Object Capture. This technology uses Photogrammetry to process a directory of images to generate a 3D asset using a Mac. The shoe displayed above was made using Object Capture which has impressive detail.


Apple’s documentation summarizes ARKit best:

“ARKit combines device motion tracking, camera scene capture, advanced scene processing, and display conveniences to simplify the task of building an AR experience.”

ARKit uses the technique of visual-inertial odometry. A combination of the device sensors and computer vision analysis of the scene for recognizing notable features and tracking the world. Such as detecting a flat plane surface like a table or the floor. It uses the concept of anchors to… well anchor virtual content to these features. Then keeping track of their position and orientation over time as the device moves.

The parent class ARAnchor which has several subclasses for what features virtual content can be anchored to such as ARObjectAnchor, ARImageAnchor, ARPlaneAnchor, and ARGeoAnchor.

ARKit can handle different behavior with configurations. Only one configuration can be used at a time and for different use cases. World Tracking has a world coordinate system and places content into it. For example, if an image anchor with content is detected, the content will stay in the last known location in the world if the image is not visible. Image Tracking forgoes the world and emphasizes tracking known images to persist the content on them if and only if the images are visible. Geo Tracking also uses a world yet allows for geo anchors at different Latitude and Longitude in available cities.


So how does RealityKit tie in?

“RealityKit framework implements high-performance 3D simulation and rendering. RealityKit leverages information provided by the ARKit framework to seamlessly integrate virtual objects into the real world.”

At a high level, ARKit uses the devices sensors and camera to build the scene yet it does not render virtual content. From the information that ARKit provides, RealityKit renders the content. This content can behave realistically with lighting, occlusion, physics, audio, and interactions with the real environment. You’ll rarely have to work with ARKit’s API unless you want complete control like we do in FioriARKit. RealityKit is a layer above and abstracts away a lot of that complexity.

RealityKit uses the Entity-Component-System architectural pattern that is commonly seen in game engines such as Unity. This deserves an entire post, yet here are the basics.

Entity is a base class which has fundamental properties that a 3D asset needs such as a Transform (position, scale, rotation). Components represent different behavior or functionality. Transform in this case conforms to the Component protocol and Entity conforms to HasTransform. This is a powerful concept because Entities only need to contain the Components that they need to function. You can then create different Systems of Entities that have some combination of Components for different behaviors.

Apple has implemented two primary subclasses, AnchorEntity and ModelEntity.

AnchorEntity conforms to HasAnchoring which means it has an AnchoringComponent storing anchoring data. It has bridging functionality with ARAnchors and is not meant for rendering 3D assets. AnchorEntities under the hood wrap around ARAnchors or this can be done explicitly when an ARAnchor is discovered.

ModelEntity is for 3D assets. It conforms to HasModel and HasPhysics. ModelComponent is for its 3D Mesh and the Physics components are for how instances behave in simulation.

The flow is adding ModelEnitites to AnchorEntities as children then AnchorEntities are added to the ARView’s scene.

There is also a generated code file to help simplify bridging Reality Composer. After dragging a rcproject file into your Xcode project this generated file contains convenience methods for extracting the scenes or getters for children Entities. These convenience declarations are named after the rcproject file, scene, and entity names. It’s as simple as these 3 lines with Reality Composer assuming the ARView is properly added to the app.

let arView = ARView(frame: .zero)
let orbitScene = try! Sample.loadOrbitScene() // Sample.rcproject has the scene named OrbitScene


Airplane orbits shoe on tap

The rcproject conveniently adds a reality file with the orbitScene in the App Bundle for you. The orbitScene is extracted as an AnchorEntity with the shoe and plane as Entity children. Since it’s been added to the ARView scene, RealityKit will automatically look for an ARAnchor ARKit finds (horizontal plane since that’s what the scene was made with in Reality Composer) and place the AnchorEntity with its content on it. Tapping on the airplane will start the orbiting/rotation actions.

This can also be programmatically with only RealityKit or done explicitly from ARKit using the ARSessionDelegate. Yet we lose the orbiting behavior and would have to implement that in code using the Airplane’s transform and animate its movement. Not trivial. The image and models need to be included as resource files as well. Adding them to the scene with RealityKit and ARKit can be done like so:

func session(_ session: ARSession, didAdd anchors: [ARAnchor]) {
    let imageAnchor = anchors.compactMap { $0 as? ARImageAnchor }.first!
    let anchorEntity = AnchorEntity(anchor: imageAnchor)

    let airplaneModel = try! ModelEntity.load(named: "airplane")
    let shoeModel = try! ModelEntity.load(named: "shoe")

    // Maybe some code to handle their position, scale, and behavior
    // With the Entity Component System this could be done elegantly





You may ask yourself where Lidar fits into all of this. A Lidar sensor emits light and times how long it takes to return to the device. Apple devices send waves of infrared pulses which form points. Then creates a mesh from the environment. This can be useful for mapping a room with sceneReconstruction, ARKit can attempt to classify the mesh such as floor, ceiling, window, wall, seat, etc. There are applications for this outside of leisure AR such as indoor mapping and accurate depth detection. For AR, this expands where content can be placed on surfaces. Before Lidar we were limited to vertical and horizontal planes. Now content can be placed on complex geometry.


3D Vase on Complex Geometry of Arm Chair Pillow

Colorized Depth Map

(WWDC Explore ARKit 4)


This is just a snapshot of what’s possible. Apple has done an incredible job of providing the tools for crafting Augmented Reality experiences. The technology is consistently improving and we are at the edge of where enterprise use cases are now more feasible than ever. SAP Fiori for iOS ARKit is an evolving response to this innovation. While FioriARKit abstracts most of the above away from the Developer, its useful and interesting to understand the underlying concepts.

Assigned tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.