Building Spatial Memory Part 3: What I Learned Building an AR App for Location-Based Memories

Let me be honest with you — I had no idea how hard AR development actually was when I started this project.

I had this cool idea walking in the mountains a few months ago: "What if you could pin digital notes and photos to real-world coordinates? Like a Pinterest for the physical world. Only when you're physically standing there can you unlock the memory."

That idea sounded amazing in my head. I already built the backend with PostGIS and Redis GEO (you can read about that in my previous post), and I thought "Okay, frontend AR part can't be that much harder, right? It's 2026, all the APIs are there."

Yeah. About that. I learned the hard way.

The Reality Check: AR Isn't Magic (Yet)

So I started looking at options. I'm an iOS person primarily, so ARKit is the obvious choice. I'd heard great things — people building amazing AR experiences. But when I actually sat down to code...

Honestly, the first week was just one WTF moment after another.

Let me show you what I expected vs what I got.

What I expected:

Give me GPS coordinate → show content at that coordinate → done.
ARKit handles the heavy lifting of tracking.

What actually happened:

GPS is only accurate to 5-10 meters in open areas
GPS accuracy drops to 20-50 meters in cities (thanks, "urban canyon" effect from tall buildings)
AR tracking drifts if you stand still for more than a minute
World tracking doesn't work well in featureless areas (open fields, plain walls)
Different phones have wildly different performance

Oh, and that's just the beginning.

The First Big Problem: GPS vs AR Coordinate Systems

Here's something no tutorial really explains clearly: GPS gives you latitude/longitude on a sphere. ARKit gives you a coordinate system relative to where you started the session. They don't talk to each other automatically.

You have to convert between them yourself.

Let me show you the code I ended up with (it's actually not that bad once you figure it out):

import ARKit
import CoreLocation

class ARCoordinateConverter {
    private var startLocation: CLLocation?
    private var startTransform: float4x4?

    func setStartLocation(_ location: CLLocation, _ transform: float4x4) {
        self.startLocation = location
        self.startTransform = transform
    }

    func arPosition(for targetLocation: CLLocation) -> SCNVector3? {
        guard let startLocation = startLocation, let startTransform = startTransform else {
            return nil
        }

        // Calculate distance and bearing from start to target
        let distance = targetLocation.distance(from: startLocation)
        let bearing = startLocation.bearing(to: targetLocation)

        // Convert bearing/radius to Cartesian coordinates in AR space
        // Bearing is degrees clockwise from north
        let radians = bearing * .pi / 180.0
        let x = Float(distance) * sin(radians)
        let z = Float(distance) * cos(radians)

        // Apply the initial rotation from ARKit's starting position
        let rotatedPosition = startTransform * SCNVector4(x, 0, z, 1)

        return SCNVector3(rotatedPosition.x, rotatedPosition.y, rotatedPosition.z)
    }
}

// Helper extension for bearing calculation
extension CLLocation {
    func bearing(to destination: CLLocation) -> Double {
        let lat1 = latitude * .pi / 180.0
        let lon1 = longitude * .pi / 180.0
        let lat2 = destination.latitude * .pi / 180.0
        let lon2 = destination.longitude * .pi / 180.0

        let dLon = lon2 - lon1
        let y = sin(dLon) * cos(lat2)
        let x = cos(lat1) * sin(lat2) - sin(lat1) * cos(lat2) * cos(dLon)
        let bearing = atan2(y, x) * 180.0 / .pi

        return (bearing + 360).truncatingRemainder(dividingBy: 360)
    }
}

That's the core conversion. Seems simple enough, right? But here's where it gets tricky...

The Second Big Problem: Drift, Drift Everywhere

ARKit's world tracking is really good — for objects around you in a room. But when you're talking about placing things 50 meters away and walking there, drift becomes a huge problem.

What happens:

You start at point A, get GPS fix, start AR session
You walk 50 meters to point B
ARKit's tracking has drifted by 2-5 meters along the way
Your pinned memory is now floating in the middle of the air, not where it's supposed to be

I spent about two weeks trying different solutions to this. Let me save you some time:

What Didn't Work

1. Just rely on ARKit everyday

Drift is too much after 50+ meters. Your pins are never where you expect.

2. Reset tracking every 10 meters

You lose world anchor continuity, pins jump around when you reset. Super disorienting for users.

3. Use image features to relocalize

Works great if there are unique features (statues, unique buildings). Fails completely in open fields or featureless suburbs. Not reliable for generic use.

What Kind Of Works (Good Enough For a Side Project)

I ended up with a hybrid approach that's definitely not perfect but works for 80% of cases:

func updatePosesIfNeeded(currentLocation: CLLocation, anchors: [ARAnchor]) {
    // If we have good GPS accuracy (< 10m), correct AR positions
    if currentLocation.horizontalAccuracy < 10 {
        // Recalibrate the offset between GPS and AR coordinates
        coordinateConverter.updateCalibration(currentLocation: currentLocation)
    }

    // For each anchor, blend between AR position and GPS position based on accuracy
    for anchor in spatialAnchors {
        let arPosition = getARPosition(from: anchor)
        let gpsPosition = getGPSPosition(from: anchor.gpsCoordinate)

        // Blend factor: more GPS weight when accuracy is better
        let blendWeight = calculateBlendWeight(currentAccuracy: currentLocation.horizontalAccuracy)
        let finalPosition = blend(arPosition, gpsPosition, weight: blendWeight)

        anchor.node.position = finalPosition
    }
}

The key insight: GPS is more accurate on a global scale, AR is more accurate locally. Blend them based on GPS accuracy. When GPS gets a good fix, use it to nudge everything back into alignment.

It's not perfect, but it's good enough for a hobby project. Users expect some wonkiness with AR anyway.

The Third Big Problem: User Experience Is Actually Harder Than the Tech

Okay, so you get coordinates converted and drift somewhat under control. Now what?

How do users actually place a pin?

I went through three different UX designs before I found something that sort of works.

Attempt 1: "Just tap where you want it"

People would tap the screen, but they had no idea what distance that corresponded to. Taps would end up 2 meters away or 200 meters away. Complete guesswork for users. Abandoned this after user testing with one person.

Attempt 2: "Pin it at your current location"

Technically easier, but what's the fun in that? You can't place a pin across the valley on that scenic overlook you're looking at. Defeats the whole idea of "pin it where it actually is."

Attempt 3 (Current): Slider Distance + Tap Direction

Users tap the direction they want to place the pin, then drag a slider to set how far away it is. Preview the pin position as they drag. Place when it looks right.

// When user taps the screen
func handleTap(_ gesture: UITapGestureRecognizer) {
    let location = gesture.location(in: sceneView)
    guard let hitResult = sceneView.hitTest(location, types: .featurePoint).first else {
        return
    }

    // Get the direction from camera to tap point
    let direction = hitResult.worldTransform.columns.3
    let distance = currentSliderValue // from 1 to 100 meters

    // Calculate final position along that direction
    let normalizedDirection = normalize(SIMD3<Float>(direction.x, direction.y, direction.z))
    let finalPosition = cameraPosition + normalizedDirection * Float(distance)

    // Convert AR position back to GPS coordinate and save
    let gpsCoordinate = converter.convertARPositionToGPS(finalPosition)
    createSpatialPin(at: gpsCoordinate, arPosition: finalPosition)
}

This is what I shipped. It's still not perfect, but users can actually get it where they want it after 10-15 seconds of fiddling. That's a win in my book.

Privacy: The Surprisingly Easy Win

One of the biggest concerns I had going in was privacy. If people are pinning personal memories to locations, do I really want all those photos on my server?

Here's what I ended up doing, and it's been perfect:

All photos go directly to Cloudflare R2 from the user's phone. My backend never touches them.

The flow:

User takes a photo in the app
App requests a pre-signed PUT URL from my backend
App uploads directly to R2
Backend never sees the image bytes

Code snippet from the Go backend:

func (s *Server) createPreSignedUploadHandler(w http.ResponseWriter, r *http.Request) error {
    userID := getCurrentUserID(r)
    objectKey := fmt.Sprintf("%s/%s.jpg", userID, uuid.New().String())

    // Generate pre-signed PUT URL valid for 15 minutes
    req := &s3.PutObjectInput{
        Bucket: aws.String(r2Bucket),
        Key:    aws.String(objectKey),
    }

    url, err := s.presigner.PresignPutObject(req, 15*time.Minute)
    if err != nil {
        return err
    }

    json.NewEncoder(w).Encode(map[string]string{
        "upload_url": url.URL,
        "public_url":  fmt.Sprintf("https://%s.r2.dev/%s", r2Bucket, objectKey),
    })
    return nil
}

This is brilliant for three reasons:

Zero egress fees (thanks Cloudflare R2!)
I don't have to store or process user images → less liability, less infrastructure
Privacy by design — if something goes wrong with my server, your photos are still private because I don't hold them

Pros:

Extremely cheap (R2 free tier handles 10GB for free, my storage bill is usually < $0.50/month)
Scales infinitely — I can handle 10 users or 10,000 users without changing anything
I don't have to think about image processing or abuse reports (well, okay, I do have to respond to DMCA if needed, but that's expected)

Cons:

No automatic image compression on the server — have to do it on the client
Users with bad connections might fail mid-upload and need to retry
I can't generate thumbnails on the server, so client has to do that too

Overall, totally worth it for a side project.

Let's Get Real: Pros and Cons of This Whole Approach

I've been working on this for about three months now, and I have some thoughts.

What's Working Well ✅

The idea is still cool — when it actually works, it's magical. You stand where you took your wedding photo, open the app, and there's your original photo floating right where you stood. Chills.
Backend architecture is solid — PostGIS + Redis GEO + R2 has not once let me down. All queries are < 50ms, costs are negligible. That part's actually done.
Privacy model works — users seem to appreciate that I'm not hoarding their photos. The direct upload just works.
ARKit gets you 80% there for free — you don't need to build any tracking yourself. Apple's done the hard work. That part works pretty well.

What Still Sucks ❌

GPS accuracy is still the biggest problem — in cities with tall buildings, you can be off by 20+ meters. Your pin is in the wrong building. There's only so much you can do about this with just a phone.
AR drift is always with you — the longer the walk, the worse it gets. My hybrid blending helps, but it doesn't solve the problem.
Battery life is atrocious — ARKit + GPS + screen on continuously = your phone dies in 2-3 hours. You can't use this for a full day of hiking. It's just not feasible.
Cold start is brutal — no one's using it until there are enough interesting pins, but no one pins interesting things until there are users. Classic chicken and egg problem. It's a side project, so I don't have the resources to seed it with thousands of pins.
Android support is on my TODO list but... — ARCore is similar but different. I'd have to rewrite a bunch of code. Not happening unless there's actual interest.

Who Should Try This?

You're building a tourist app for a specific city/area — GPS accuracy is good enough, you can pre-place all the pins, users don't walk that far (so drift is minimal). This works great.
Indoor location-based experiences — GPS doesn't work indoors, but you can use Bluetooth beacons. If you control the space, this is amazing.
Experimental / art projects — it's a cool concept, people forgive the rough edges when the idea is interesting.

Who Should Stay Away?

If you need 1-meter accuracy everywhere — you're gonna have a bad time. Wait another 3-5 years for better AR/VR hardware.
If you want to replace Google Maps — don't even try. Google's been at this forever, they have way more data.
If you expect it to work perfectly on the first try — AR location-based stuff is still the wild west. Be prepared to spend weeks debugging edge cases.

My Surprising Takeaway After Three Months

Honestly? I went into this thinking AR was going to be the hard part. Turns out, AR is the easy part.

The real hard parts are all the things that sound simple:

Getting coordinates from two different systems to agree
UX design that makes sense to regular people
Dealing with the limitations of phone GPS in different environments
Battery life that's actually usable

That said — it's still a fun project. When it works, it feels like magic. And that's why we do side projects, right? Not everything needs to be a billion-dollar startup. Sometimes you just have an idea you want to see working in the world.

If you want to check out the code (it's open source), you can find it here: https://github.com/kevinten10/spatial-memory. Everything's there — backend Go code, iOS Swift code, the whole shebang.

Question For You

Have you ever tried building an AR location-based app? What surprised you the most? Did you solve the drift problem better than I did? I'd love to hear your experiences in the comments — I'm still learning this stuff.

Or if you've ever had that moment where you thought "I wish I could leave a note here for future me to find," does this sound like something you'd use? Drop a comment below.