“Machine Learning is 99% Manual Labor” — and what I’ve been up to

In the past year, I’ve been working on building a solution to a problem I frequently encountered while working as a data scientist. To train the machine learning models I was building, I had to use clean, labeled data. I often ended up manually reviewing and labeling data myself — a process that was tedious, lengthy, boring, error-prone, and probably not the best use of my time. As a fellow data scientist eloquently put it, “Machine Learning is 99% Manual Labor.”

“For every one hour driven, it’s approximately 800 hours to label”

A year ago, we started Clay Sciences to come up with a better way. Currently we are focused on addressing the challenge of labeling or annotating videos that serve as training data. Self-driving cars and smart security cameras are two examples for areas where there is a pressing need for huge amounts of annotated videos for training machine learning models, with thousands of hours of video that need to be annotated. One company in this space, drive.ai, recently noted that “for every one hour driven, it’s approximately 800 human hours to label” — astonishing numbers.

“I just love cleaning and labeling data” — no data scientist, ever

Over the past year, we developed a simpler way to annotate videos. Our solution is scalable (can tackle any amount of videos at any length), accurate, and simple to use. Data scientists upload their videos to our platform, specify what they want to annotate, and get back their annotated videos when they’re ready (typically within a day).

Under the hood, the platform uses a combination of crowdsourcing thousands of qualified workers, automated quality control, and complex pipelines processing videos and annotations.

Here is an example of what a video annotated with our platform looks like:

If you’re interested in learning more or think this can be helpful for your team, feel free to reach out (info@claysciences.com).

Subscribe to our mailing list

* indicates required