CYBERSPACE — One of the most popular image databases used by Artificial Intelligence (AI) and Machine Learning entrepreneurs and researchers reportedly contains a vast number of images harvested from adult sites without the explicit authorization of the people depicted in them.
ImageNet, an image library used worldwide to train AI systems to recognize and generate images, includes images depicting “porn actresses,” the U.K.’s The Register reported today.
“The library consists of 14 million images, each placed into categories that describe what's pictured in each scene,” wrote The Register’s Emerging Tech reporter Katyanna Quach. “This pairing of information — images and labels — is used to teach artificially intelligent applications to recognize things and people caught on camera.”
According to the article, a typo led a Bay Area researcher into stumbling on the adult content.
“ImageNet's categories are sorted in alphabetic order and referenced by software in numerical ascending order,” the article continues. “In a subset of the database [UnnifyID’s Chief Scientist Vinay] Prabhu was using for his research, bicycles were category 444, and bikinis were 445. A single-digit typo in his code caused his neural network to draw from category 445 — bikinis — rather than the bicycles in 444.”
“At first I found it amusing, and I decided to look through the data set,” Prabhu said.
He found “photos of a naked child’s backside, porn stars, shenanigans at frat parties, plus private and intimate photos of men dressed in women’s underwear.”
“It was clear that these were unethical,” Prabhu added.
“Sometimes the nature of what is pornographic is debatable, but in some cases, the links to the porn websites are included right in the images,” he told The Register.
XBIZ has previously reported on image banks like ImageNet and their controversial use of unauthorized adult content to generate “Virtual Porn Stars.” To read “AI Gone Wild: A 'Disruptor' Tries (and Fails) to Generate Synthetic Porn Stars,” click here.
For the Register article “Inside the 1TB ImageNet data set used to train the world's AI: Naked kids, drunken frat parties, porno stars, and more,” click here.