‘SmartLens’ app created by a high schooler is a step towards all-purpose visual search

‘SmartLens’ app created by a high schooler is a step towards all-purpose visual search

A couple of years ago I was eagerly expectant of an app that would identify anything you pointed it at. Turns out the problem was much harder than anyone expected — but that didn’t stop high school senior Michael Royzen from trying. His app, SmartLens, attempts to solve the problem of seeing something and wanting to identify and learn more about it — with mixed success, to be sure, but it’s something I don’t mind having in my pocket.

Royzen reached out to me a while back and I was curious — as well as skeptical — about the idea that where the likes of Google and Apple have so far failed (or at least failed to release anything good), a high schooler working in his spare time would succeed. I met him at a coffee shop to see the app in action and was pleasantly surprised, but a little baffled.

The idea is simple, of course: You point your phone’s camera at something and the app attempts to identify it using an enormous but highly optimized classification agent trained on tens of millions of images. It connects to Wikipedia and Amazon to let you immediately learn more about what you’ve ID’ed, or buy it.

It recognizes more than 17,000 objects — things like different species of fruit and flower, landmarks, tools and so on. The app had little trouble telling an apple from a (weird-looking) mango, a banana from a plantain and even identified the pistachios I’d ordered as a snack. Later, in my own testing, I found it quite useful for identifying the plants springing up in my neighborhood: periwinkles, anemones, wood sorrel, it got them all, though not without the occasional hesitation.

The kicker is that this all happens offline — it’s not sending an image over the cell network or Wi-Fi to a server somewhere to be analyzed. It all happens on-device and within a second or two. Royzen scraped his own image database from various sources and trained up multiple convolutional neural networks using days of AWS EC2 compute time.

Then there are far more than that number in products that it recognizes by reading the text of the item and querying the Amazon database. It ID’ed books, a bottle of pills and other packaged goods almost instantly, providing links to buy them. Wikipedia links pop up if you’re online as well, though a considerable amount of basic descriptions are kept on the device.

On that note, it must be said that SmartLens is a more than 500-megabyte download. Royzen’s model is huge, since it must keep all the recognition data and offline content right there on the phone. This is a much different approach to the problem than Amazon’s own product recognition engine on the Fire Phone (RIP) or Google Goggles (RIP) or the scan feature in Google Photos (which was pretty useless for things SmartLens reliably did in half a second).

“With the several past generations of smartphones containing desktop-class processors and the advent of native machine learning APIs that can harness them (and GPUs), the hardware exists for a blazing-fast visual search engine,” Royzen wrote in an email. But none of the large companies you would expect to create one has done so. Why?

The app size and toll on the processor is one thing, for sure, but the edge and on-device processing is where all this stuff will go eventually — Royzen is just getting an early start. The likely truth is twofold: it’s hard to make money and the quality of the search isn’t high enough.

It must be said at this point that SmartLens, while smart, is far from infallible. Its suggestions for what an item might be are almost always hilariously wrong for a moment before arriving at, as it often does, the correct answer.

It identified one book I had as “White Whale,” and no, it wasn’t Moby Dick. An actual whale paperweight it decided was a trowel. Many items briefly flashed guesses of “Human being” or “Product design” before getting to a guess with higher confidence. One flowering bush it identified as four or five different plants — including, of course, Human Being. My monitor was a “computer display,” “liquid crystal display,” “computer monitor,” “computer,” “computer screen,” “display device” and more. Game controllers were all “control.” A spatula was a wooden spoon (close enough), with the inexplicable subheading “booby prize.” What?!

This level of performance (and weirdness in general, however entertaining) wouldn’t be tolerated in a standalone product released by Google or Apple. Google Lens was slow and bad, but it’s just an optional feature in a working, useful app. If it put out a visual search app that identified flowers as people, the company would never hear the end of it.

And the other side of it is the monetization aspect. Although it’s theoretically convenient to be able to snap a picture of a book your friend has and instantly order it, it isn’t so much more convenient than taking a picture and searching for it later, or just typing the first few words into Google or Amazon, which will do the rest for you.

Meanwhile for the user there is still confusion. What can it identify? What can’t it identify? What do I need it to identify? It’s meant to ID many things, from dog breeds and storefronts, but it likely won’t identify, for example, a cool Bluetooth speaker or mechanical watch your friend has, or the creator of a painting at a local gallery (some paintings are recognized, though). As I used it I felt like I was only ever going to use it for a handful of tasks in which it had proven itself, like identifying flowers, but would be hesitant to try it on many other things when I might just be frustrated by some unknown incapability or unreliability.

And yet the idea that in the very near future there will not be something just like SmartLens is ridiculous to me. It seems so clearly something we will all take for granted in a few years. And it’ll be on-device, no need to upload your image to a server somewhere to be analyzed on your behalf.

Royzen’s app has its issues, but it works very well in many circumstances and has obvious utility. The idea that you could point your phone at the restaurant you’re across the street from and see Yelp reviews two seconds later — no need to open up a map or type in an address or name — is an extremely natural expansion of existing search paradigms.

“Visual search is still a niche, but my goal is to give people the taste of a future where one app can deliver useful information about anything around them — today,” wrote Royzen. “Still, it’s inevitable that big companies will launch their competing offerings eventually. My strategy is to beat them to market as the first universal visual search app and amass as many users as possible so I can stay ahead (or be acquired).”

My biggest gripe of all, however, is not the functionality of the app, but in how Royzen has decided to monetize it. Users can download it for free but upon opening it are immediately prompted to sign up for a $2/month subscription (though the first month is free) — before they can even see whether the app works or not. If I didn’t already know what the app did and didn’t do, I would delete it without a second thought upon seeing that dialog, and even knowing what I do, I’m not likely to pay in perpetuity for it.

A one-time fee to activate the app would be more than reasonable, and there’s always the option of referral codes for those Amazon purchases. But demanding rent from users who haven’t even tested the product is a non-starter. I’ve told Royzen my concerns and I hope he reconsiders.

It would also be nice to scan images you’ve already taken, or save images associated with searches. UI improvements like a confidence indicator or some kind of feedback to let you know it’s still working on identification would be nice as well — features that are at least theoretically on the way.

In the end I’m impressed with Royzen’s efforts — when I take a step back it’s amazing to me that it’s possible for a single person, let alone one in high school, to put together an app capable of completing such sophisticated computer vision tasks. It’s the kind of (over-) ambitious app-building one expects to come out of a big, playful company like the Google of a decade ago. This may be more of a curiosity than a tool right now, but so were the first text-based search engines.

SmartLens is in the App Store now — give it a shot.

Source: Mobile – Techcruch

Who’s a good AI? Dog-based data creates a canine machine learning system

Who’s a good AI? Dog-based data creates a canine machine learning system
We’ve trained machine learning systems to identify objects, navigate streets and recognize facial expressions, but as difficult as they may be, they don’t even touch the level of sophistication required to simulate, for example, a dog. Well, this project aims to do just that — in a very limited way, of course. By observing the behavior of A Very Good Girl, this AI learned the rudiments of how to act like a dog.
It’s a collaboration between the University of Washington and the Allen Institute for AI, and the resulting paper will be presented at CVPR in June.
Why do this? Well, although much work has been done to simulate the sub-tasks of perception like identifying an object and picking it up, little has been done in terms of “understanding visual data to the extent that an agent can take actions and perform tasks in the visual world.” In other words, act not as the eye, but as the thing controlling the eye.
And why dogs? Because they’re intelligent agents of sufficient complexity, “yet their goals and motivations are often unknown a priori.” In other words, dogs are clearly smart, but we have no idea what they’re thinking.
As an initial foray into this line of research, the team wanted to see if by monitoring the dog closely and mapping its movements and actions to the environment it sees, they could create a system that accurately predicted those movements.
In order to do so, they loaded up a Malamute named Kelp M. Redmon with a basic suite of sensors. There’s a GoPro camera on Kelp’s head, six inertial measurement units (on the legs, tail and trunk) to tell where everything is, a microphone and an Arduino that tied the data together.
They recorded many hours of activities — walking in various environments, fetching things, playing at a dog park, eating — syncing the dog’s movements to what it saw. The result is the Dataset of Ego-Centric Actions in a Dog Environment, or DECADE, which they used to train a new AI agent.
This agent, given certain sensory input — say a view of a room or street, or a ball flying past it — was to predict what a dog would do in that situation. Not to any serious level of detail, of course — but even just figuring out how to move its body and to where is a pretty major task.
“It learns how to move the joints to walk, learns how to avoid obstacles when walking or running,” explained Hessam Bagherinezhad, one of the researchers, in an email. “It learns to run for the squirrels, follow the owner, track the flying dog toys (when playing fetch). These are some of the basic AI tasks in both computer vision and robotics that we’ve been trying to solve by collecting separate data for each task (e.g. motion planning, walkable surface, object detection, object tracking, person recognition).”
That can produce some rather complex data: For example, the dog model must know, just as the dog itself does, where it can walk when it needs to get from here to there. It can’t walk on trees, or cars, or (depending on the house) couches. So the model learns that as well, and this can be deployed separately as a computer vision model for finding out where a pet (or small legged robot) can get to in a given image.
This was just an initial experiment, the researchers say, with success but limited results. Others may consider bringing in more senses (smell is an obvious one) or seeing how a model produced from one dog (or many) generalizes to other dogs. They conclude: “We hope this work paves the way towards better understanding of visual intelligence and of the other intelligent beings that inhabit our world.”

Source: Gadgets – techcrunch

Autonomous cars could peep around corners via bouncing laser

Autonomous cars could peep around corners via bouncing laser
 Autonomous cars gather up tons of data about the world around them, but even the best computer vision systems can’t see through brick and mortar. But by carefully monitoring the reflected light of a laser bouncing off a nearby surface, they might be able to see around corners — that’s the idea behind recently published research from Stanford engineers. Read More

Source: Gadgets – techcrunch