Computer vision models are created to perform a certain task. For example, to detect or classify objects seen on camera footage. But before they can perform this task they have to be trained. This training happens pretty much in the same way as humans learn; we show the model large amounts of example images. Those images can either be of something we are trying to find, or something we don't want to find. By repeating this process for many hours on powerful computers, the AI model will end up learning its task, which means you can then show it images it has never seen and it will give you the correct answer based on the task.
For years, researchers and companies have been collecting data to train better and better AI models, because the more unique data you show it, the better it will perform its task. Those collections of data are what we call "datasets", and many of them are available publicly for free. If you're curious to see what a dataset looks like we recommend you look at MSCOCO or the Open Images Dataset.
You also need to explain to your AI model what you expect from it. This is done through a process called labeling, or annotating. It's similar to how you would tag faces of people on Facebook or add keywords to your holiday snaps. First, you draw a rectangle (or bounding box) around the object of interest and then you put a label on it. You'll need to repeat these steps for each object in the image. And this has to be done for every single image you would like your model to train on. While labeling a few images by yourself might be fun and certainly educational, a good size dataset can easily run in the 100,000s of pictures and take up to a minute per picture to label. You do the math.
To solve this part of the process people started using crowd-sourcing platforms, which is basically a way of breaking down the work into smaller jobs and then giving it out to thousands of workers. It's a relatively fast way of solving the annotation problem, but also quite expensive and difficult to manage.
Today, you can find dozens of annotation companies that specialize in labelling your pictures often using workforces in low income countries. Besides being expensive for large datasets, it's also extremely common to find labelling mistakes. Imagine sitting at your desk and doing the same work hours on end, almost like working on a conveyor belt in a factory. You're bound to make the odd mistake here and there. Unfortunately, these mistakes can create real problems when you include them in your AI training process. It's like learning from a textbook in school with mistakes in the assignment; it's confusing and causes you to feel uncertain about your actual understanding of the subject.
And this is where synthetic data comes in to the picture (pun intended).
Synthetic data is a relatively new technology in the field of Artificial Intelligence, and at CVEDIA, we have been pioneering its use over the past number of years.
In short, synthetic data solves the need for data collection by generating, or rendering the training images. While there are several types of synthetic data, we'll focus on the 3D rendering approach here. This is by far the most powerful type and gives the greatest flexibility.
Rendering is a process very similar to how Hollywood creates animation movies. Instead of using real-actors and objects, it draws them based on 3D models. 3D models are digital versions of an object or person. It describes an object's size, shape and looks so that a computer can visualize it on screen.
Because rendering is very fast, you can create 1000's of images per hour without leaving your desk. The rendering can take place on any regular desktop computer, often using the same technology that is used to create video games.
For this to work correctly you need a large collection of 3D models, or you'll suffer from the same problems as real data collection. At CVEDIA, we have over 30,000 3D models for many types of objects, including clothing and exotic animals to buildings and ships.
It's this rich variety in objects that gives rendered synthetic data its power.
Since the objects you render are virtual, you also have a lot of freedom. You could place any type of backpack on a 3D model of a person and give it 100s of different colors. This generated picture directly replaces the need for an actual picture of a person with that backpack. Which means you won't actually have to go out in the world and collect pictures of it. This is a huge time saver, especially if what you're trying to detect or classify is a rare or very specific object.
In some cases there might simply not be any data. Take for example our work with RESOLVE. They developed a tiny camera called WildEyes AI, which runs AI to detect poachers and protect endangered species. The species they are trying to protect is near extinct, which makes collecting images of it in the wild nearly impossible. Using synthetic data, we were able to recreate these images. For example, the spotted snow leopard and successfully train an AI model to detect this animal.
In many ways it looks just like the real thing. There's a video here that shows you what it looks like. Unless you've got a trained eye it might look real to you. But it's not!
This video was rendered using only 3D models, and it allowed us to train an AI model for use in a smart city application.
If you'd like to see more examples of what synthetic data looks like, you can view them in our
CVEDIA YouTube channel.
So far we've focused on 3D rendering to generate synthetic data. This is our method of choice at CVEDIA but there are 2 other types:
Although it sounds like a chicken and the egg problem, AI can be used to generate training data as well. The Deep Learning based synthetic data method (1) does not give you the same level of control or flexibility as 3D rendering though. There are no 3D models or cameras, and you have no control on the annotation labels. How it works is that you train a neural network on either a public dataset or your own data after which it can "synthetize" more images like the ones it has already seen. This means it can only generate more of the same, no new object types or conditions. But it can help overcome certain biases when you're low on data.
Augmentations (2) is not the first thing that comes to mind for many people when you talk about synthetic data. But in effect, it's also a process of "synthesizing" data. Data that wasn't captured by a camera, but instead created by a computer.
Augmentation is the process of changing a photo to be different enough for a network to learn something new from it, but similar enough that the meaning remains the same. For example, flipping a photo horizontally (like a mirror) is a completely different image for a neural network, but the meaning is still the same.
There are 100's of different augmentation types and people use them a lot to extract more value from a small dataset. Augmentations also are not exclusive. They work totally fine together with deep learning synthetic data and our 3D rendering approach.
That really depends on what you are trying to do. Synthetic datasets are great when you want your application to work anywhere, anytime and in unknown situations. But there are times where an AI is being run from very similar camera viewpoints. For example, within the same type of environment, or when the camera position stays the same. Here it might be better just to collect real data and benefit from the biases it contains (yes, biases can be a good thing, too!). In a way, we call this process 'overfitting'.
Another major problem that plagues many AI applications is "bias". Because training data is often collected from specific regions in the world, it's highly biased. Simply put, there are a lot more white people in most datasets then there are black or Asian people. This imbalance in data can cause all kinds of wrong behavior in a neural network; behavior that is hard to explain or justify because often we don't understand why AI models do what they do. Racial bias is, of course, just one of many examples. Think about gender fluidity, body composition, rare genetic disorders, culture and so on, and that's just talking about people. The same applies to vehicle models, buildings styles, animal species, and even the color of water in different parts of the world. No matter how well you collect your data in the real world, the distribution will always be biased on some level.
And how synthetic data helps with this?
Because the training data is rendered, there is a lot more control on what it looks like. For example, we can generate images of people with different ethnicities or body types in a fair and balanced way. By showing these balanced examples to a neural network you eliminate its ability to learn a bias. Unfortunately, neural networks are extremely good at finding patterns, I mean, that's what we expect them to do in the first place. So any kind of feature, whether it's on a person, building, or car can become some sort of bias. You solve one, another pops up. This game of whack-a-mole is what we spend a lot of our time on at CVEDIA.
Imagine a situation in which you are collecting training data but legally cannot use it. This could be something as simple as recording people on the streets in Germany. Without proper blurring of faces, bodies or license plates, you're at legal risk. Unfortunately, blurring those objects creates images that simply won't work for AI training.
Or think about training data for applications that run inside prisons, hospitals, or nursing homes. All places where data collection is off-limits because of privacy concerns. Yet, people still expect their surveillance cameras or AI care-takers to work correctly.
With synthetic data we can recreate those environments and the people in them and generate data for any task you want to perform. This means you won't have to collect any privacy sensitive data and your application will still work fine.
After creating an AI model with either real or synthetic data, you'll usually want to measure its performance. You'll want to make sure that in between updates nothing changed or broke. You can use real data for this, but it presents a very limited test at best. If you capture validation data in the summer, you likely won't have any images of snow or rain. The same goes when you record videos from a handheld camera, there won't be any views looking top down.
Using synthetic data you create a scene only once, for example, a ship on the ocean. And then change many settings like weather conditions, ocean conditions, or even time of day. From this single scene you're now able to synthesize thousands of different variations.
Now that is powerful validation!
We hope this helped raise your understanding about synthetic data. We're not going to lie to you, if you're looking to develop your own synthetic data from scratch: it's tough. It's taken us years to get to where we are today with many roadblocks along the way and significant investments in building 3D models.
If you are only interested in getting to a solution fast, we are happy to have a conversation about it. We have dozens of ready-made solutions based on synthetic data that can be integrated directly into your camera or hardware platform, which means you won't have to pay for development and can save months of work.
CVEDIA is a vibrant, full-remote organization that has pioneered the use of synthetic data. Our mission is to make computer vision affordable, no matter how niche the application. All of our technology has been developed based on the hard learned lessons of getting our client solutions deployed. Whether it's on the edge, near-edge or cloud, we are yet to encounter a chipset we couldn't deploy on.