In this step of the tutorial, we’ll walk through the creation of the previous example for realtime object detection on a live camera feed in only 10 lines of Python code. The program will load the detection network with the detectNet
object, capture video frames and process them, and then render the detected objects to the display.
For your convenience and reference, the completed source is available in the python/examples/my-detection.py
file of the repo, but the guide below will act like they reside in the user’s home directory or in an arbitrary directory of your choosing.
Here’s a quick preview of the Python code we’ll be walking through:
import jetson.inference
import jetson.utils
net = jetson.inference.detectNet("ssd-mobilenet-v2", threshold=0.5)
camera = jetson.utils.videoSource("csi://0") # '/dev/video0' for V4L2
display = jetson.utils.videoOutput("display://0") # 'my_video.mp4' for file
while display.IsStreaming():
img = camera.Capture()
detections = net.Detect(img)
display.Render(img)
display.SetStatus("Object Detection | Network {:.0f} FPS".format(net.GetNetworkFPS()))
There’s also a video screencast of this coding tutorial available on YouTube:
First, open up your text editor of choice and create a new file. Below we’ll assume that you’ll save it on your host device under your user’s home directory as ~/my-detection.py
, but you can name and store it where you wish. If you’re using the Docker container, you’ll want to store your code in a Mounted Directory, similar to what we did in the Image Recognition Python Example.
At the top of the source file, we’ll import the Python modules that we’re going to use in the script. Add import
statements to load the jetson.inference
and jetson.utils
modules used for object detection and camera capture.
import jetson.inference
import jetson.utils
note: these Jetson modules are installed during the
sudo make install
step of building the repo.
if you did not runsudo make install
, then these packages won’t be found when the example is run.
Next use the following line to create a detectNet
object instance that loads the 91-class SSD-Mobilenet-v2 model:
# load the object detection model
net = jetson.inference.detectNet("ssd-mobilenet-v2", threshold=0.5)
Note that you can change the model string to one of the values from this table to load a different detection model. We also set the detection threshold here to the default of 0.5
for illustrative purposes - you can tweak it later if needed.
To connect to the camera device for streaming, we’ll create an instance of the videoSource
object:
camera = jetson.utils.videoSource("csi://0") # '/dev/video0' for V4L2
The string passed to videoSource()
can actually be any valid resource URI, whether it be a camera, video file, or network stream. For more information about video streams and protocols, please see the Camera Streaming and Multimedia page.
note: for compatible cameras to use, see these sections of the Jetson Wiki:
- Nano:https://eLinux.org/Jetson_Nano#Cameras
- Xavier:https://eLinux.org/Jetson_AGX_Xavier#Ecosystem_Products_.26_Cameras
- TX1/TX2: developer kits include an onboard MIPI CSI sensor module (0V5693)
Next, we’ll create a video output interface with the videoOutput
object and create a main loop that will run until the user exits:
display = jetson.utils.videoOutput("display://0") # 'my_video.mp4' for file
while display.IsStreaming():
# main loop will go here
Note that the remainder of the code below should be indented underneath this while
loop. Similar to above, you can substitute the URI string for other types of outputs found on this page (like video files, ect).
The first thing that happens in the main loop is to capture the next video frame from the camera. camera.Capture()
will wait until the next frame has been sent from the camera and loaded into GPU memory.
img = camera.Capture()
The returned image will be a jetson.utils.cudaImage
object that contains attributes like width, height, and pixel format:
<jetson.utils.cudaImage>
.ptr # memory address (not typically used)
.size # size in bytes
.shape # (height,width,channels) tuple
.width # width in pixels
.height # height in pixels
.channels # number of color channels
.format # format string
.mapped # true if ZeroCopy
For more information about accessing images from Python, see the Image Manipulation with CUDA page.
Next the detection network processes the image with the net.Detect()
function. It takes in the image from camera.Capture()
and returns a list of detections:
detections = net.Detect(img)
This function will also automatically overlay the detection results on top of the input image.
If you want, you can add a print(detections)
statement here, and the coordinates, confidence, and class info will be printed out to the terminal for each detection result. Also see the detectNet
documentation for info about the different members of the Detection
structures that are returned for accessing them directly in a custom application.
Finally we’ll visualize the results with OpenGL and update the title of the window to display the current peformance:
display.Render(img)
display.SetStatus("Object Detection | Network {:.0f} FPS".format(net.GetNetworkFPS()))
The Render()
function will automatically flip the backbuffer and present the image on-screen.
That’s it! For completness, here’s the full source of the Python script that we just created:
import jetson.inference
import jetson.utils
net = jetson.inference.detectNet("ssd-mobilenet-v2", threshold=0.5)
camera = jetson.utils.videoSource("csi://0") # '/dev/video0' for V4L2
display = jetson.utils.videoOutput("display://0") # 'my_video.mp4' for file
while display.IsStreaming():
img = camera.Capture()
detections = net.Detect(img)
display.Render(img)
display.SetStatus("Object Detection | Network {:.0f} FPS".format(net.GetNetworkFPS()))
Note that this version assumes you are using a MIPI CSI camera. See the Opening the Camera Stream
section above for info about changing it to use a different kind of input.
To run the application we just coded, simply launch it from a terminal with the Python interpreter:
$ python3 my-detection.py
To tweak the results, you can try changing the model that’s loaded along with the detection threshold. Have fun!
Next | Semantic Segmentation with SegNet
Back | Running the Live Camera Detection Demo</p>