- On Photo Intelligence
- Posts
- Building the foundations for Photo Intelligence
Building the foundations for Photo Intelligence
In my first newsletter, I introduced Photo Intelligence: the idea that AI can supercharge photos, enabling businesses and people to do things that were previously considered impossible, incredibly time-consuming, or too expensive. Along with that thesis, I shared feedback from a wide variety of people I interviewed around specific pain points they experience around staying connected with others, curating their photos, and creating digital and physical artifacts from their photos — that Photo Intelligence could help with.
While I found strong agreement on the thesis, there were two additional questions that nobody could answer: 1) Is the technology mature enough for all of these use cases? and 2) Do I have the knowledge and skills to implement it?
I quickly found that the technology is still difficult to leverage, either in the form of easy-to-use open-source software or services provided by other companies. I ended up studying the history and evolution of techniques around face recognition, large vision-language models, and image quality assessment (thanks ChatGPT Deep Research, Wikipedia, and arXiv!). I then combined those learnings with my existing knowledge from decades of working with photos as a hobbyist photographer and software engineer to build something useful.
I’ve spent the past few months prototyping and building the confidence that I could use Photo Intelligence to create consumer apps that solve these pain points. Along the way, I also realized that it would potentially be even more impactful to create a developer platform that makes it dramatically easier for others to leverage Photo Intelligence themselves.
What I’ve built
Before getting to the cool features, any platform that works with photos needs to have the basics: the ability to store photos, group them together into albums, and so on. Also, whenever I’ve built platforms in the past, I’ve always found it incredibly helpful to build for real use cases, as a way to ensure that the APIs work well and to demonstrate what types of real experiences they can power.
An API-compatible, cloud-hosted backend for Immich
So I built a new cloud-hosted backend that can power Immich, an incredible open-source project that aims to be a self-hosted replacement for Google Photos. To make this work, I built a backend with all of the foundational components needed to store and manage photos, and then exposed those components through endpoints compatible with Immich’s API. I can point the Immich mobile app or web app at my backend, and it just works.
Immich web application powered by my server
I didn’t build out all of the APIs (for example, none of the sharing features work) but I built out enough to convincingly demonstrate that not only could I build an alternative cloud-hosted backend for Immich, with state-of-the-art Photo Intelligence features, but that it would make it a lot easier for non-technical people to use Immich without jumping the incredibly high hurdle of self-hosting.
APIs and SDKs for Photo Intelligence
While Immich has a rich ecosystem of products and services built around its APIs, from tools to import your Google Photos or Apple Photos library to integrations with digital photo frames, my goal is broader: to power not just a Google Photos-like photo manager, but any product or workflow that involves photos and videos.
Next, I built out a more general and ideal set of APIs for my backend and exposed them as APIs with OpenAPI specifications, as well as language-specific SDKs.
OpenAPI specifications rendered by Redocly
The APIs are intended to be pretty straightforward, standard REST APIs. The magic is that after uploading assets, you can semantically search over those assets, list the faces that appear in them, see whether they’re associated with people you’ve tagged, and so on.
The SDKs, which are generated by Stainless, make the APIs even easier to use. I’ve personally vibe coded demo apps using the SDKs, and an early tester I’m collaborating with used Bolt to generate an app with the SDKs as well.
Screenshot from the README of the TypeScript SDK
Screenshot from the README of the Python SDK
An MCP for Photo Intelligence
Finally, I exposed the same set of APIs via Model Context Protocol (MCP), as a proof of concept to integrate it with large language models that know how to use tools, like Anthropic’s Claude. With the MCP server running, you can chat with a photo library, using the APIs above as tools to interact with it.
Here’s a basic example of what Claude can do with the MCP server.
You can also ask Claude to tell you things about the photos.
And… if you really want it to, Claude can modify or delete your photo library as well.
Getting involved
Over the past month, many people have asked me how they can help, and admittedly I haven’t always had great answers. However, I really do appreciate all of the feedback and support I’ve received from the community, and here are a few things I would really appreciate:
If you have personal or business use cases for Photo Intelligence, I’d love to hear about them, especially if I haven’t mentioned them in this newsletter in the past.
If you’d like to try building something with the platform, I’d love to hear from you. I’m not making anything self-serve right now, both because the platform isn’t ready for scaled consumption and because I want to understand everyone’s use cases better and work with them directly as needed.
If you know someone with a background in Applied AI as it relates to photos and videos, I’d love to chat with them. While I believe my background and experience gives me the fairly unique ability to build a platform for Photo Intelligence from scratch, I’d love to learn more from people with deeper experience on the AI side.
In all of the cases above, drop me a comment below, or send me an email.
Special thanks
I wouldn’t have made it this far so quickly without all of your feedback and insights, and I wanted to make a few callouts:
Taylor Majewski: for reviewing my newsletter drafts with the keen eye of a writer and journalist, and especially for calling me out when there’s context in my head that I shouldn’t assume that my readers also have.
Kenzo Fong: for showing me that it’s possible to vibe code an app on my platform with Bolt — that was way better looking than the demo I hand-coded — and expand my thinking to a potential audience of people that aren’t professional developers.
Spencer Adams-Rand: for reviewing my newsletter draft and suggesting that I add more ways for people to engage and get involved. It sounds so obvious now, but I’m not naturally the type to actively solicit help from others.
Ted
Reply