Object Detection

With over 200 new classes of objects, the Object Detection subsystem enhances Lightship's contextual awareness capabilities by creating semantically labeled 2D bounding boxes that dynamically update as real-world objects appear on-screen. For each bounding box, the subsystem processes the central square crop of the image, then makes an independent prediction for every subclass and returns the probability that the detected object belongs to each of them. Lightship Object Detection also provides the following model card which explains how detections were trained for person, a human hand, or a human face.

Image with Bounding Boxes around Detected Objects

Basic Usage

By placing Lightship's ARObjectDetectionManager in a scene and subscribing to the ObjectDetectionsUpdated event, developers can receive realtime detection information in the form of XRDetectedObjects. You can also listen for the MetadataInitialized event to receive the list of object classes when the model becomes available to use.

The frame rate of the ARObjectDetectionManager can also be adjusted to save performance or detect objects at a faster rate.

Image displaying ARObjectDetectionManager properties

Object Detection Categories

There are 206 different categories that the neural network looks for inside a bounding box.

Category List

Category	Description
human_face	human face
human_hand	human hand
person	person, man, woman, boy, girl, human body
skull	skull
aircraft	aircraft, airplane, helicopter, rocket, parachute
bicycle	bicycle, stationary bicycle, unicycle
boat	boat, watercraft, barge, gondola, canoe, jet ski, submarine, personal flotation device
bus	bus
car	car, snowmobile, golf cart, tank, snowplow, ambulance, van, limousine, taxi, bus, truck
cart	cart
motorcycle	motorcycle
taxi	taxi
train	train
truck	truck
vehicle	vehicle, car, land vehicle, snowmobile, golf cart, tank, train, snowplow, ambulance, bicycle, unicycle
wheel	wheel, tire, bicycle wheel
wheelchair	wheelchair
bench	bench
billboard	billboard, scoreboard
christmas_tree	christmas_tree
door	door
door_handle	door_handle
fire_hydrant	fire_hydrant
flag	flag
parking_meter	parking_meter
poster	poster, picture frame
sculpture	sculpture, bust, bronze sculpture
street_light	street_light
traffic_light	traffic_light
traffic_sign	traffic_sign, stop sign
waste_container	waste_container, garbage bin, trash can
water_feature	water_feature, swimming_pool, jacuzzi, fountain
window	window (both indoor and outdoor)
backpack	backpack
clothing	clothing, sports uniform
coat	coat, jacket
dress	dress
fedora	fedora, sun hat, cowboy hat
footwear	footwear, roller skates, boot, high heels, sandal
glasses	glasses, sunglasses, goggles
handbag	handbag, briefcase, picnic basket, luggage and bags
headwear	headwear, hat, cowboy hat, fedora, sombrero, sun hat, swim cap, helmet, bicycle helmet, football helmet
roller_skates	roller_skates
shirt	shirt
shorts	shorts
skirt	skirt, miniskirt
sock	sock
suit	suit
suitcase	suitcase, briefcase
tie	tie
trousers	trousers, jeans
umbrella	umbrella
baseball_bat	baseball_bat
baseball_glove	baseball_glove
football	football (soccer)
frisbee	frisbee, flying disc
kite	kite
paddle	paddle
rugby_ball	rugby_ball
skateboard	skateboard
skis	skis, ski
snowboard	snowboard
sports_ball	sports_ball, ball, football, cricket ball, volleyball, tennis ball, rugby ball
surfboard	surfboard
tennis_ball	tennis_ball
tennis_racket	tennis_racket, table tennis racket, racket
accordion	accordion
brass_instrument	brass_instrument, french horn, saxophone, trombone, trumpet
drum	drum
flute	flute, harmonica, oboe
guitar	guitar
musical_instrument	musical_instrument, organ, banjo, cello, drum, french horn, guitar, harp, harpsichord, harmonica, oboe,
piano	piano, organ, harpsichord, musical keyboard
string_instrument	string_instrument, guitar, banjo, cello, harp, violin
violin	violin
apple	apple
banana	banana
berry	berry, strawberry, raspberry
broccoli	broccoli
carrot	carrot
citrus	citrus, orange, lemon, grapefruit
coconut	coconut
egg	egg
food	food, fast food, hot dog, french fries, waffle, pancake, burrito, snack, pretzel, popcorn, cookie,
grape	grape
mushroom	mushroom
pear	pear
pumpkin	pumpkin, squash
tomato	tomato
drink	drink, beer, cocktail, coffee, juice, tea, wine, bottle
hot_drink	hot_drink, tea, coffee
juice	juice
bread	bread
cake	cake, tart, muffin
cheese	cheese
dessert	dessert, ice cream, cake, dessert, muffin, doughnut, donut, bagel, cookie, biscuit, waffle, pancake,
donut	donut, doughnut, bagel, pretzel
fast_food	fast_food, hot_dog, french_fries, pizza, burrito, hamburger, sandwich
french_fries	french_fries
hamburger	hamburger
hot_dog	hot_dog
ice_cream	ice_cream
pizza	pizza
sandwich	sandwich, submarine sandwich, burrito
sushi	sushi
bed	bed, infant bed, dog bed
chair	chair, stool
couch	couch, sofa, studio couch, loveseat, sofa bed
furniture	furniture, chair, cabinetry, desk, wine rack, couch, sofa bed, loveseat, wardrobe, nightstand,
shelves	shelves, wine rack, bookcase, spice rack
storage_cabinet	storage_cabinet, wardrobe, cupboard, closet, cabinetry, filing cabinet, chest of drawers, bathroom cabinet
table	table, dining table, desk, table, coffee table, kitchen table, billiard table, countertop, nightstand,
bathtub	bathtub
fireplace	fireplace, wood-burning stove
microwave	microwave, microwave oven
oven	oven
refrigerator	refrigerator
screen	screen, tv, television, computer monitor, tablet computer
sink	sink
tap	tap, shower
toaster	toaster
toilet	toilet, bidet
balloon	balloon
barrel	barrel
book	book
bottle	bottle
bowl	bowl, mixing bowl
box	box
camera	camera, binoculars
candle	candle
cannon	cannon
chopsticks	chopsticks
clock	clock, wall clock, alarm clock
coin	coin
computer_keyboard	computer_keyboard, keyboard
computer_mouse	computer_mouse
cooking_pan	cooking_pan, frying pan, wok, waffle iron, slow cooker, pressure cooker
cup	cup, mug, coffee cup
curtain	curtain, window blind
doll	doll
flowerpot	flowerpot, vase
fork	fork
hair_dryer	hair_dryer
headphones	headphones
jug	jug, measuring cup, teapot, cocktail shaker, pitcher, beaker, kettle
knife	knife, kitchen knife, pizza cutter, chisel, dagger, sword
lamp	lamp, lantern, candle, light bulb, flashlight, torch, ceiling fan
laptop	laptop
microphone	microphone
pen	pen, pencil
phone	phone, telephone, cell phone, mobile phone, smartphone, corded phone, ipod
pillow	pillow
plate	plate, saucer, platter, cake stand
potted_plant	potted_plant, houseplant
remote	remote, remote control
scissors	scissors
snowman	snowman
spoon	spoon, ladle, spatula
teapot	teapot, kettle
teddy_bear	teddy_bear
tin_can	tin_can, cooking spray
toothbrush	toothbrush
toy	toy, doll, dice, flying disc, teddy bear
watch	watch
wine_glass	wine_glass
flower	flower
rose	rose
sunflower	sunflower
animal	animal, squid, shellfish, oyster, lobster, shrimp, crab, bird, magpie, woodpecker, blue jay, ostrich,
bird	bird, magpie, woodpecker, blue jay, ostrich, penguin, raven, chicken, eagle, owl, duck, canary, goose,
parrot	parrot
water_bird	water_bird, duck, goose, swan
butterfly	butterfly, moths and butterflies
insect	insect, tick, centipede, isopod, bee, beetle, ladybug, ant, moths and butterflies, caterpillar, butterfly
dolphin	dolphin
fish	fish, goldfish, shark, "Rays and Skates", seahorse, squid
goldfish	goldfish
jellyfish	jellyfish
seal	seal, sea lion, harbor seal, walrus
shellfish	shellfish, lobster, oyster, shrimp, crab, starfish, snail
whale	whale
alpaca	alpaca
bear	bear, brown bear
big_cat	big_cat, lynx, jaguar, tiger, lion, leopard, cheetah
camel	camel
cat	cat
cow	cow, bull, cattle
crocodile	crocodile, alligator
deer	deer, antelope
dog	dog
elephant	elephant
frog	frog
giraffe	giraffe
hippopotamus	hippopotamus
horse	horse, donkey, mule
kangaroo	kangaroo
panda	panda
pig	pig
polar_bear	polar_bear
rabbit	rabbit
reptile	reptile, lizard, snake, turtle, tortoise, sea turtle, crocodile, frog
rhinoceros	rhinoceros
sheep	sheep, goat
squirrel	squirrel
turtle	turtle, tortoise, sea turtle
zebra	zebra

note

Some of these categories are also covered under one of the other 206 categories; for example, "cat," "dog," and a few others fall under the "animal" category. The neural network makes an independent prediction for each of the 206 categories. For example, the neural network will predict that the bounding box of a cat is both a "cat" and "animal" with relatively high, but likely different, confidences, and it's not guaranteed that one will always be predicted with higher confidence than the other. So if your application is looking for a specifc type of object (in this case, either "cat" or "animal"), make sure to check the first several most confidence categorizations for each bounding box instead of only the most confidence categorization.

Super-categories List

Categories	Covers
car	car, taxi
vehicle	vehicle, car, train, bicycle, taxi, motorcycle, bus, truck
footwear	footwear, roller skates
headwear	headwear, fedora
sports ball	sports ball, football, rugby ball, tennis ball
musical instrument	brass instrument, string instrument, piano, accordion, drum, flute
string instrument	string instrument, guitar, violin
food	food, apple, banana, berry, broccoli, carrot, citrus, coconut, egg, grape, pear, pumpkin, tomato, bread, cake, cheese, dessert, donut, fast food, hamburger, hot dog, ice cream, pizza, sandwich, sushi
drink	drink, hot drink, juice
dessert	dessert, cake, ice cream, donut
fast food	fast food, french fries, hot dog, pizza, hamburger, sandwich
furniture	furniture, bed, chair, couch, shelves, storage cabinet, table
jug	jug, teapot
lamp	lamp, candle
toy	toy, doll, teddy bear
flower	flower, rose, sunflower
animal	animal, bird, parrot, water bird, dolphin, fish, goldfish, jellyfish, seal, shellfish, whale, alpaca, bear, big cat, camel, cat, cow, crocodile, deer, dog, elephant, frog, giraffe, hippopotamus, horse, kangaroo, panda, pig, polar bear, rabbit, reptile, rhinoceros, sheep, squirrel, turtle, zebra
bird	bird, parrot, water bird
insect	insect, butterfly
fish	fish, goldfish
reptile	reptile, crocodile, frog, turtle

Person Detection Model Card v0.4

Model Details

Model last updated: 2024-02-29
Model version: v0.4
License: refer to the terms of service for Lightship.

Technical specifications

The object detection model returns a set of bounding boxes and reports the probability that the box is a person, a human hand, or a human face.

Intended use

Intended use cases

Identifying people (more specifically, human hands or human faces) in an image.
Querying the presence or absence of people, human hands, or human faces in an image.

Permitted users

Augmented reality developers through Niantic Lightship.

Out-of-scope use cases

This model does not provide the capability to:

Track individuals
Identify or recognise individuals

Factors

The following factors apply to all object detection provided in the Lightship ARDK, including person detection:

Scale: objects / classes may not be detected if they are very far away from the camera.
Lighting: extreme light conditions may affect the overall performance.
Viewpoint: extreme camera views that have not been seen during training may lead to a miss in detection or a class confusion.
Occlusion: objects may not be detected if they are covered by other objects.
Motion blur: fast camera or object motion may degrade the performance of the model.
Flicker: there may be a ‘jittering’ effect between predictions of temporally adjacent frames.

For person detection specifically, based on known problems with computer vision technology, we identify potential relevant factors that include subgroups for:

Geographical region
Skin tone
Gender
Body posture: certain body configurations may be harder to predict due to appearing less often in the training corpus.
Other: age, fashion style, accessories, body alterations, etc.

Fairness evaluation

At Niantic, we strive for our technology to be inclusive and fair by following strict equality and fairness practices when building, evaluating, and deploying our models. We define person detection fairness as follows: a model makes fair predictions if it performs equally on images that depict a variety of the identified subgroups. The evaluation results focus on measuring the performance of the union of the human channels (person, human hand, and human face) on the first three main subgroups (geographical region, skin tone, and gender).

Instrumentation and dataset details

Our benchmark dataset comprises 5650 images captured around the world using the back camera of a smartphone, with these specifications:

Only one person per image is depicted.
Both indoors and outdoors environments.
Captured with a variety of devices.
No occlusions.

Images are labeled with the following attributes:

Geographical region: based on the UN geoscheme with the merge of European subregions and Micronesia, Polynesia, and Melanesia:
- Northern Africa
- Eastern Africa
- Middle Africa
- Southern Africa
- Western Africa
- Caribbean
- Central America
- South America
- Northern America
- Central Asia
- Eastern Asia
- South Eastern Asia
- Southern Asia
- Western Asia
- Europe
- Australia and New Zealand
- Melanesia, Micronesia, and Polynesia
Skin tone: following the Fitzpatrick scale, images are annotated from subgroup 1 to 6. Skin tone is a self-reported value provided by the person in each image.
Gender: images are annotated with self-reported gender.

Metrics

The standard metric for evaluating object detection models -- and the one we use -- is Intersection over Union (IoU). It is computed as follows:

IoU = (overlap between predicted and g.t. boxes) / (union between predicted and g.t. boxes)

Reported IoUs are averages (mean IoU or mIoU) over images belonging to the referenced subgroup unless stated otherwise.

Fairness criteria

A model is considered to be making unfair predictions if it yields a performance (mIoU) for a particular subgroup that is three standard deviation units or more from the mean across all subgroups.

Results

Geographical evaluation

Average performance across all 17 regions is 78.74% with a standard deviation of 1.22%. All regions exhibit a performance in the range of [76.92%, 82.17%]. The maximum difference between the mean and the worst performing region is 1.83%, within our fairness criterion threshold of 3 standard deviations (3x1.22% = 3.65%).

Regions	mIoU	stdev	Number of images
Northern Africa	78.26%	15.04%	301
Eastern Africa	77.41%	17.11%	336
Middle Africa	77.30%	15.72%	322
Southern Africa	79.09%	14.93%	368
Western Africa	79.04%	13.26%	364
Caribbean	79.01%	12.20%	412
Central America	79.44%	13.79%	415
South America	78.39%	14.21%	397
Northern America	79.09%	13.00%	335
Central Asia	79.52%	12.56%	229
Eastern Asia	77.60%	15.37%	346
South Eastern Asia	77.86%	14.86%	333
Southern Asia	79.34%	12.15%	353
Western Asia	78.80%	14.91%	370
Europe	79.40%	13.14%	320
Australia and New Zealand	76.92%	18.13%	374
Melanesia, Micronesia and Polynesia	82.17%	11.08%	75
Average (across all images)	78.55%	14.55%	5650
Average (across regions)	78.74%	1.22%	-

Skin tone evaluation results

Average performance across all six skin tones is 78.58% with a standard deviation of 0.24%. All skin tone subgroups yield a performance in the range of [78.23%, 78.97%]. The maximum difference between the mean and the worst performing skin tone subgroup is 0.34%, within our fairness criterion threshold of 3 stdevs (3x0.24% = 0.71%).

Skin tone (Fitzpatrick scale)	mIoU	stdev	Number of images
1	78.59%	12.00%	247
2	78.49%	14.59%	1919
3	78.61%	14.39%	1463
4	78.23%	16.52%	457
5	78.97%	13.60%	706
6	78.56%	14.67%	858
Average (across all images)	78.55%	14.55%	5650
Average (across skin tones)	78.58%	0.24%	-

Gender evaluation results

Average performance of all evaluated gender subgroups is 78.53% with a range [78.01%, 79.05%]. The difference between the average and the worst performing gender is 0.52%, within our fairness criterion threshold of 3 stdevs (3x0.74% = 2.22%).

Perceived gender	mIoU	stdev	Number of images
Female	78.01%	15.08%	2585
Male	79.05%	13.96%	3065
Average (across all images)	78.55%	14.55%	5650
Average (across genders)	78.53%	0.74%	-

Ethical Considerations

Privacy: When the model is used in ARDK, inference is only applied on-device and the image is not transferred off the user device.
Human Life: This model is designed for entertainment purposes within an augmented reality application. It is not intended to be used for making human life-critical decisions.
Bias: Training datasets have not been audited for diversity and may present biases not surfaced by our benchmarks.

Caveats and Recommendations

Our annotated dataset only contains binary genders, which we include as male/female. Further data would be needed to evaluate across a spectrum of genders.
An ideal skin tone evaluation dataset would additionally include camera details, and more environment details such as lighting and humidity. Furthermore, the Fitzpatrick scale has limitations as it doesn't fully represent the full spectrum of human skin tones.
This model card is based on the work of Mitchell, Margaret, et al. "Model cards for model reporting." Proceedings of the conference on fairness, accountability, and transparency. 2019. Link

Basic Usage​

Object Detection Categories​

Person Detection Model Card v0.4​

Model Details​

Technical specifications​

Intended use​

Intended use cases​

Permitted users​

Out-of-scope use cases​

Factors​

Fairness evaluation​

Instrumentation and dataset details​

Metrics​

Fairness criteria​

Results​

Geographical evaluation​

Skin tone evaluation results​

Gender evaluation results​

Ethical Considerations​

Caveats and Recommendations​