0%
0%

Peek Under the Hood: How to Build an AI Camera?

Log by log: See how we build reCamera V2.0! Platform benchmarks, CAD iterations, deep debug dives. Open build with an engineer’s eye view!

jennajenna

Become a Hackaday.io member

Not a member? You should Sign up.

Already have an account? Log in.

Just one more thing

To make the experience fit your profile, pick a username and tell us what interests you.

Pick an awesome username
hackaday.io/
Your profile's URL: hackaday.io/username. Max 25 alphanumeric characters.
Pick a few interests
Projects that share your interests
People that share your interests
Similar projects worth following
3.3k views
View Gallery
Ever wondered how an AI camera goes from an idea to your hands? The team and I are live-streaming the birth of reCamera V2.0 (Or we better say the whole AI camera roadmap) here (Of course, you could take the reCamera V1.0 here for reference before you dive in: https://www.seeedstudio.com/reCamera-2002w-64GB-p-6249.html) — and inviting YOU to adopt the development process: Witness milestones, sway key decisions, and build alongside us. You can expect raw, real-time logs here in our record case: - Platform wars? Real benchmarking battles. - Hardware tradeoffs? CAD tears → triumphs. - AI model crashes? Debug diaries. - Prototype fails? Glorious lessons (shared loudly). - And more. The team and I will craft the whole process of building our next-gen AI camera here transparently. So, dear community! Grab your virtual lab coat: suggest sensors, critique code, or simply watch the magic unfold! From sketches to shelves — this is how AI cameras are born.

🤔 How can you understand the birth of this AI Camera series, which we name with reCamera? The following explanation may help you understand this clearly.

⛳Our vision is clear: to build a comprehensive matrix of AI cameras that meets diverse real-world needs. Last year, we took a significant step forward by launching reCamera 2002—an open-source AI camera reference design hailed as "the shortest pathway to a market-ready AI camera."

From its earliest development stages, reCamera 2002 (please find a comprehensive intro below👇) was shaped by invaluable feedback and suggestions from our vibrant community. It was your collective voice that helped us craft this beloved first-generation device. 💚

Therefore, as we began planning the next evolution of reCamera and the specific construction of our broader AI camera matrix, we knew we wanted to share everything with the community, embracing a radical open-source spirit! 🫶

This is why we're launching "Peek Under the Hood: How to Build an AI Camera?" – an ongoing series where we open-source and continuously update every detail and critical step involved in building our AI cameras, right from the very start. 


Through this open-source journey, we aim to:

  • Build Truly Community-Driven Products: Leverage the insights of our community and talented developers by sharing openly to ensure reCamera evolves precisely to meet the needs and solve the real challenges.
  • Give Back & Inspire: Sharing every piece of our learnings, successes, and even setbacks to empower others in the open-source ecosystem. Let our journey fuel yours.

Join us as we openly pioneer the future of accessible, powerful AI Vision hardware—together! 👐


💡 Why do we make this first-gen reCamera?

Today, as processors (both SOC and MCU) are becoming smaller and more powerful, it is now possible to combine the processor with camera sensors. In fact, many IPCs (IP cameras) are already using this design to accelerate AI detection on edge devices.

So today, we introduce reCamera, an open-source camera platform for everyone to play with. We have divided this project into 3 interchangeable parts:

  • Core Board
  • Sensor Board
  • Base Board

This design allows users to easily change the interfaces and camera sensors to suit their needs. We hope that these components can be freely combined in any way.

By building this hardware platform and ecosystem, we enable other applications to grow on this platform without the need to worry about changing from one platform to another in the future.

The engineering focus on modularity, high performance, and ease of use ensures that reCamera serves as a powerful platform for developers and makers. This design streamlines complex hardware processes, allowing users to integrate vision AI capabilities into their systems efficiently and creatively.

We've taken care of the intricate hardware work, freeing up time for user innovation. The modular design enables users to rapidly switch cameras and customize interfaces, transforming development from months to weeks only!

  • Interface/Base Board - SensorCam P4 Housing Assembly

    jianwei wang 2 days ago 0 comments

    After several days of waiting, both the PCB board and the 3D model have been printed. All the components are shown in the figure below. We have pre-soldered the PCB and connected the components to the adapter board. Now let's assemble them.

    First, fix the main body of the head to the main board with screws.

    Then place them in the groove of the handle front cover.

    Insert the adapter board into the motherboard, first place the 5-key button, then the knob, and finally the battery.

    Then cover the head cover. It is necessary to install the magnet in the magnet slot first, then insert the protrusion of the cover into the empty position of the main board adapter plate, and press it gently.

    Finally, cover the handle back cover.

    Completed.

    Assembly video:

    Overall, after all the components are installed into the shell, they are relatively stable, which indicates that our shell design is acceptable. Next, we will try to make the first demo: Camera + ranging module.

     

  • Interface/Base Board-SensorCam P4 Adapter Board Design

    jianwei wang 6 days ago 0 comments

    After selecting the hardware for the SensorCam P4 last week, we found that the interface led out from the main board is located at the bottom right corner of the board. However, what we need to create is a plug-and-play structure. If it is directly plugged in at the original position, the sensor module will not be at the center of the screen, and the obtained data will also be inaccurate. Therefore, we decided to make an adapter board to arrange the pins at the center of the screen, so that more accurate data can be obtained. The schematic diagram is as follows:

    Among them, U1 is a 40-pin header used to connect the interface led out from the main board. U2, U3, and U4 are pins led out in the middle of the screen, and a three-sided design is adopted to enhance the stability when the module is plugged in. U5 is the interface for the button, and U6 is the interface for the knob. Among them, pins 4, 5, and 32 will be used as module fixing pins, but they are not entirely fixed pins. The levels of these three pins are different for each module, so these three pins can be used to distinguish different modules. The main controller automatically identifies the module by reading the levels of these three pins, thereby displaying different UIs and data. Theoretically, SensorCam P4 can automatically identify up to 8 modules, but other pins can also be added for expansion. Below is the 3D model of the adapter board:

    In addition to the adapter board, we have also made adapter boards for the ranging module and the thermal imaging module to adapt to the main adapter board, so as to achieve a foolproof direct plug design. Below are the schematic diagram and 3D diagram of the adapter board for the ranging module:

    Among them, H5, H6, and H7 are ranging modules with an identification code of 000, and their pins are all fixed, so there is no need to worry about inserting the pins incorrectly. The adapter board of the thermal imaging module may be more complicated because it is a board-to-board connector, as shown in the figure below:

    We have led out the necessary IIC interface and SPI interface for it. The other unused pins are temporarily led out and can be chosen not to be used. A filter capacitor has also been added to the power supply. Below is its 3D diagram:

    H14, H15, and H16 are connected to VCC, GND, and GND respectively, so the coding of the thermal imaging module is 100. Later, we will model the overall shell based on the actual size of these adapter boards connected to the main board.

     

  • SensorCam P4 Housing Design

    jenna 09/18/2025 at 02:17 0 comments

    The SensorCam P4 is a handheld device similar in shape to a magnifying glass. To ensure convenience during 3D printing, we divided the device into the handle and the head. The overall design is shown in the figure below:

    • The handle and the head are connected through a mortise and tenon joint,
    • the front cover and the back cover of the handle are combined by means of a buckle, 
    • and the back cover of the head and the main body are combined by magnetic attraction. 

    The combined state is shown in the figure below:

    1. Diagram of the connection between the handle and the head

    2. Connection Diagram of Handle Front and Rear Covers (Cross Section)

    3. Magnetic connection diagram of the head and the back cover

    Below are the design details of each module.

    Head

    ❶❷❸❹: Magnet placement slot.

    ❺❻❼❽: Screw fixing hole.

    ❾❿: Type-c port.

    ⓫: USB port.

    ⓬⓭: Button port.

    ⓮: SD card port.

    ⓯: Wire channel.

    ⓰: Reserved space for camera interface.


    Head cover

    ❶❷❸❹: Magnet placement slot. 

    ❺❻❼: Reserved slot for the female header of the main adapter board. 

    ❽: Adapter board support, preventing the adapter board from deforming inward when the module is inserted.

    Handle back cover

    ❶❷❸❹: Coupling female socket.

    ❺: Knob reserved hole.

    ❻❼: Battery DC charging port support.

    ❽: Wire channel.

    Overall effect

    Front Side

    Back Side

  • Sensor Selection for ESP32 P4-SensorCam P4

    IoTEgr 09/05/2025 at 02:00 0 comments

    After completing the screen selection, we further planned the overall hardware design scheme for SensorCam P4, which roughly includes the following components:

    The handle contains batteries. Next, I will elaborate on our selection considerations and final decisions regarding the mainboard, control components, power supply system, and other aspects.

    1.  Base Board

    After the screen model was selected, we found that there were compatibility challenges between the ESP32P4 chip and the originally planned display screen. After evaluating various aspects such as the form, interface, and performance of the main board, we finally chose an ESP32P4 development board integrated with a 3.4-inch touch display screen as the core main board, as shown in the figure below:

    This development board has the following advantages:

    ● Compact design: The core board is small enough to save valuable space.

    ● Rich interfaces: Provides a variety of peripheral interfaces with strong expandability.

    ● Communication capability: Onboard ESP32C6MINI module, supporting WiFi connection.

    ● Excellent display performance: 800×800 high resolution, 70% NTSC wide color gamut, 300cd/m² high brightness.

    ● Form fit: The shape design meets the ergonomic requirements of a handheld magnifying glass.

    This motherboard not only solves compatibility issues, but its highly integrated design also greatly simplifies our system architecture and reduces the overall complexity.

    2.  knob

    Why are physical knobs still needed in the touchscreen era? Because we recognize that for fine operations (such as image zooming, parameter adjustment, and menu navigation), the tactile feedback and precise control provided by physical knobs are difficult for touchscreens to replace. The knob we chose is the EC11, as shown in the figure below:

    The EC11 knob has the following characteristics:

    ● Suitable size: The diameter and height conform to ergonomics, ensuring comfortable operation.

    ● Exquisite appearance: The metallic texture is consistent with the overall style of the device.

    ● High performance: High sensitivity and accurate scale feedback.

    ● Economy: Extremely high cost performance and stable market supply.

    The EC11 knob will provide users with an intuitive parameter adjustment experience, and it performs exceptionally well especially in scenarios that require quick and precise adjustments.

    3.  Buttons

    Although equipped with a touchscreen, we firmly believe that physical buttons still have irreplaceable value in specific scenarios: for quick operations, blind operation needs, and providing clear physical feedback.The buttons we chose are five-wire circular keyboard modules, as shown in the figure below:

    The five-line circular keyboard module has the following advantages:

    ● Reasonable layout: The circular arrangement conforms to the natural movement trajectory of fingers.

    ● Excellent touch: The keys have a crisp sound and clear feedback.

    ● High sensitivity: Fast response with no sense of delay.

    ● Ergonomic design: The key travel and pressing force have been optimized, making it not easy to get tired during long-time operation.

    These buttons will serve as a supplement to touch operations, providing users with more diversified options for interaction methods.

    4.  Battery

    To meet the requirement of long-term use of portable devices, we have chosen a 5600mAh 5V lithium battery as the power solution for the following reasons:

    ● High energy density: 5600mAh capacity ensures that the device can work continuously for a long time.

    ● Stable output: 5V output voltage perfectly matches the device requirements.

    ● Reusable: Supports repeated charging and...

    Read more »

  • Interface/Base Board - Screen selection for ESP32P4-SensorCam P4

    IoTEgr 09/01/2025 at 07:20 0 comments

    During the process of selecting a screen for the SensorCam P4, after our comprehensive evaluation and multiple rounds of testing, we finally selected two representative display screens:

    Display A: 2.83-inch 480×640 resolution non-full lamination TFT-LCD screen, supporting 16.7M color display capability and touch function, with a 40pin 18-bit RGB+SPI interface.

    Display B: 2.4-inch 240×320 resolution fully bonded TFT-LCD screen, supporting 263K color display, without touch function, using a 14-pin 4-wire SPI interface.

    These two screens represent different product positioning: Display A is more excellent in terms of pixel density, color performance, and functional integrity; Display B has advantages such as low cost, easy procurement, and a narrow-bezel full lamination structure. Below, we will systematically compare their performance in parameters such as static and dynamic resolution, pixel density, brightness, color gamut, and refresh rate. Considering that the SensorCam P4 is mainly used indoors, we chose to conduct the test indoors.

    Comparison of static display effects

    It can be intuitively seen from the real-shot comparison chart the differences between the two screens in terms of pixel fineness and color reproduction (Display A is on the left, and Display B is on the right):

    1.Display complex images to compare the detail expression between the two

    It can be seen that Display A is significantly superior to Display B in terms of pixel density and edge sharpness, with stronger detail expression.

    2.Display color-rich images to compare color display effects

    It is obvious that the overall display effect of Display A is significantly more delicate and has richer color gradations.

    3.Display the comparison of thermal imaging UI prediction screens

    The performance of the two screens on such images is similar, with no obvious differences.

    Dynamic display performance test

    Display A:Supports MIPI interface, with stable and smooth frame rate performance, fully meeting dynamic display requirements.

    Display B

    ● When using the screen refresh function for color gradient testing, the frame rate can reach 63fps, which is basically smooth.

    ● There is obvious stuttering when running LVGL animations, and performance is limited when handling complex graphical interfaces.

    Summary and selection suggestions

    The comprehensive comparison results show that there is a significant gap between the two screens:

    Advantages of Display A:

    ● Higher refresh rate and frame rate stability.

    ● Wider color gamut range and color expressiveness.

    ● Higher pixel density results in a delicate display effect.

    ● Support touch interaction functionality.

    ● Larger display area. 

    Advantages of Display B:

    ● Lower procurement costs.

    ● Simple interface, few occupied pins.

    ● Narrow bezel full lamination design.

    Although Display A features a non-full lamination design and relatively wide bezels, these factors have limited impact on the actual user experience. Considering the high requirements of SensorCam P4 for display quality and user experience, we ultimately chose Display A as the screen solution for this project. Its excellent display performance and touch functionality will provide users with an interactive experience far superior to that of Display B, which is more in line with the project's positioning for high-quality visual presentation.

  • ESP32 P4 + Camera + Sensors = ?

    IoTEgr 08/22/2025 at 02:19 0 comments

    During the development of AI cameras, we realized that the form of AI cameras is far more than just RGB or depth cameras. "Specialized cameras" such as single-point ranging and thermal imaging do not rely on complex image processing, yet they can directly capture key physical information like temperature and distance, providing us with a new perspective to observe the world. This inspired us: can we create a lightweight hardware device that makes these efficient sensing capabilities more user-friendly and directly integrates them with camera images or even AI detection results, thereby achieving a wide range of functions?

    Based on this concept, we propose SensorCam P4 — a modular sensing device with a camera at its core. This device is based on the high-performance ESP32-P4 main control, and its core capabilities are realized through pluggable expansion backplanes. There is no need to reflash the firmware; you only need to insert the corresponding sensing module according to your needs to quickly expand the functions, such as:

    • Insert a thermal imaging module, and you can real-time overlay or split-screen display thermal imaging images on the camera screen, automatically identify and mark abnormal areas such as high temperature/low temperature.
    • Insert a laser ranging module, and the precise distance of the object as well as what the object is can be displayed in real-time on the screen.
    • Insert temperature, humidity, and air pressure sensors to obtain and display environmental data.

    SensorCam P4 adopts a highly modular design, getting rid of the cumbersome drawbacks of traditional multi-sensor integration solutions and focusing on the in-depth integration of camera images and sensing data. The camera is no longer just "seeing colors"; it can also let you "see temperature", "see distance" and so on, making it easy to obtain multi-dimensional data. You can also customize and add various modules according to actual needs. The device can automatically identify the type of inserted sensor and load the corresponding exclusive UI interface. For example, in thermal imaging mode, you can choose to display in overlay or split-screen; in ranging mode, it displays values and aiming reference lines, etc. It looks roughly like this

    Why choose ESP32 P4 as the main controller

    Because its characteristics are highly consistent with the core requirements of the device - efficiently processing camera data, handling AI tasks, and achieving sensor fusion - which is specifically reflected as follows:

    1. Native camera and display support

    • Equipped with a MIPI-CSI interface, it can directly connect to high-resolution camera modules to stably acquire image data.
    • A native MIPI-DSI display interface that can directly drive MIPI screens to fully present the fused images.

    2. Powerful processing capability and built-in AI capability

    • Equipped with a 400MHz main frequency CPU, its performance is far superior to the previous generation of ESP32, and it can easily handle concurrent data from multiple high-speed sensors (such as cameras, thermal imaging).
    • Built-in vector instruction set and AI acceleration unit, supporting efficient local AI inference, which can directly run machine learning models such as image recognition and target detection on the device side, providing intelligent underlying support for sensor fusion.
    • Supports hardware graphics acceleration and scaling engine, which can efficiently complete image overlay, scaling and graphics rendering, significantly reducing CPU load.

    3. Rich connectivity capabilities

    • Provides various interfaces such as UART, SPI, I2C, and ADC with sufficient bandwidth, laying the foundation for simultaneous access to multiple modules such as ranging, thermal sensing, temperature and humidity.
    • Supports Wi-Fi and Bluetooth 5.x, facilitating wireless data transmission, remote monitoring, and OTA firmware upgrades.

    4. Mature development environment

    • Based on the ESP-IDF framework, it features...
    Read more »

  • What kind of AI camera web interface do you want?

    Deng MingXi 08/21/2025 at 05:47 0 comments

    reCamera Monitoring Interface Product Research

    This is both a research sharing post and a discussion topic. As Makers/consumers, what features do you want in a network monitoring interface? Feel free to leave a comment below, or provide suggestions in our community (https://github.com/Seeed-Studio/OSHW-reCamera-Series/discussions). If your suggestion is adopted, we will give you a product as a gift when reCamera is launched.

    Motivation: As the first page users interact with reCamera, it should present a sufficiently clear, powerful, and interactive interface.

    When users use smart AI cameras (or remote network cameras), what do they want to see from the interface?

    Product Goals: Clarity, interactivity, replaceable and expandable functions, and result output information.

    User Needs:

    Why would users buy an AI camera instead of a traditional IPC?

    What are the advantages of AI cameras?

    More expandable? More tailored to their own needs? Able to intelligently identify and alarm?

    Expandability is reflected in:

    1. Detection models can be replaced and self-trained.

    2. Detection logic can be customized, including defining event triggering logic and selecting detection areas.

    3. Output results can be exported and easily integrated into developers' own programs.

    Event triggering and alarming are important functions of surveillance cameras.

    User needs in different scenarios:

    Home users: Hope to detect abnormal situations at home in a timely manner through intelligent recognition, such as strangers breaking in, fire hazards, etc., and expect the interface to be simple and easy to operate.

    Enterprise users: Need comprehensive monitoring of production sites, office areas, etc., requiring intelligent recognition of production violations, personnel attendance, etc., and hope to couple with the enterprise's own management system.

    Developer users: Focus on the product's expandability and secondary development capabilities, hoping to replace detection models, self-train models, and integrate output results into their own programs.

    Market Cases:

    Most products from large companies are toB types, making it difficult for users to conduct secondary development.

    Hikvision Algorithm Platform:

    Self-developed platform with drag-and-drop processing steps (operators written by Hikvision, mostly for industrial processing, presumably traditional CV). However, Hikvision's AI Open Platform provides one-stop self-training and deployment.

    Hikvision AI Open Platform Case: Detection of masks and chef hats in the kitchen.

    If we focus on expandability, providing a secondary development platform is crucial.

    DJI Osmo Series:

    The architecture is mainly host-downloaded software + camera pure streaming. It connects via Bluetooth and transmits images through network protocols, while the setting terminal and interface run entirely on the host, reducing the burden on the end side.

    TP-Link:

    Network configuration, storage information, event triggering, camera resolution.

    Login directly by entering the IP.

    It also provides official software to monitor multiple images.

    VCN 19 - Computer client remote monitoring method - TP-LINK Visual Security

    Edge Computing Box - AI Algorithm Box - AI Edge Box - Kunyun Technology:

    Supports SDK interfaces, mainstream frameworks such as PyTorch, customization, 4 Tops computing power, 4K@60fps, but more like re.

    Content and Functional Requirements:

    Functions marked in yellow are relatively rare or even non-existent in the market.

    Basic Part:

    - Video stream display (different code streams can be selected, low code stream has higher fluency)

    - Display IP address and current time on the video screen

    - Basic operation controls: pause/play, record, screenshot, audio switch, PTZ (this part is an extended function, linked with Gimbal) (pan-tilt control, direction keys + zoom slider (if available)).

    ... Read more »

  • Compatibility Testing - RV1126B - Adapt to the video analysis algorithm sharing of embedded devices

    Deng MingXi 08/19/2025 at 07:01 0 comments

    Why Deploy Video Detection Models on Embedded Devices?

    When we talk about visual AI, many people first think of high-precision models on the server side. However, in real-world scenarios, a large number of video analysis requirements actually occur at the edge: abnormal behavior warning of smart cameras, road condition prediction of in-vehicle systems... These scenarios have rigid requirements for **low latency** (to avoid decision lag), **low power consumption** (relying on battery power), and **small size** (to be embedded in hardware devices). If video frames are transmitted to the cloud for processing, it will not only cause network delay but also may lead to data loss due to bandwidth limitations. Local processing on embedded devices can perfectly avoid these problems. Therefore, **slimming down** video detection models and deploying them to the edge has become a core requirement for industrial implementation.

    Isn't YOLO Sufficient for Visual Detection?
    The YOLO series (You Only Look Once), as a benchmark model for 2D object detection, is famous for its efficient real-time performance, but it is essentially a **single-frame image detector**. When processing videos, YOLO can only analyze frame by frame and cannot capture **spatiotemporal correlation information** between frames: for example, a "waving" action may be misjudged as a "static hand raising" in a single frame, while the continuous motion trajectory of multiple frames can clarify the action intention.
    In addition, video tasks (such as action recognition and behavior prediction) often need to understand the "dynamic process" rather than isolated static targets. For example, in the smart home scenario, recognizing the "pouring water" action requires analyzing the continuous interaction between the hand and the cup, which is difficult for 2D models like YOLO because they lack the ability to model the time dimension.

    Basic Knowledge of Video Detection Models: From 2D to 3D
    A video is essentially four-dimensional data of "time + space" (width × height × time × channel). Early video analysis often adopted a hybrid scheme of "2D CNN + temporal model" (such as I3D), that is, first using 2D convolution to extract single-frame spatial features, and then using models like LSTM to capture temporal relationships. However, this scheme does not model spatiotemporal correlations closely enough.
    **3D Convolutional Neural Networks (3D CNNs)** perform convolution operations directly in three-dimensional space (width × height × time), and extract both spatial features (such as object shape) and temporal features (such as motion trajectory) through sliding 3D convolution kernels. For example, a 3×3×3 convolution kernel will cover a 3×3 spatial area in a single frame and also span the time dimension of 3 consecutive frames, thus naturally adapting to the dynamic characteristics of videos.

    Why Introduce Efficient 3DCNNs Today?
    Although 3D CNNs can effectively model video spatiotemporal features, traditional models (such as C3D and I3D) have huge parameters and high computational costs (often billions of FLOPs), making them difficult to deploy on embedded devices with limited computing power (such as ARM architecture chips).
    The **Efficient 3DCNNs** proposed by Köpüklü et al. are designed to solve this pain point:
    1. **Extremely lightweight design**: Through technologies such as 3D depthwise separable convolution and channel shuffle, the model parameters and computational load are reduced by 1-2 orders of magnitude (for example, the FLOPs of 3D ShuffleNetV2 are only 1/10 of ResNet-18) while maintaining high precision;
    2. **Hardware friendliness**: It supports dynamically adjusting model complexity through "Width Multiplier" (such as 0.5x, 1.0x) to adapt to embedded devices with different computing power;
    3. **Plug-and-play engineering capability**: The open-source project provides complete pre-trained models (supporting datasets such...

    Read more »

  • Demo - RV1126B People Counting

    Deng MingXi 07/24/2025 at 06:26 0 comments

    In retail stores, exhibition halls, and other scenarios, real-time grasp of the number of people entering the store and customers' stay time is the core basis for optimizing operation strategies. Based on the RV1126B edge computing platform, we have built a lightweight people flow detection Demo - through camera capture and local processing, to achieve accurate passenger flow counting and stay analysis. More importantly, the technical logic of this solution is deeply bound to the application scenarios, ensuring both detection accuracy and adaptation to actual operational needs.

    Display of Detection Results:

    Generate Statistical Result Text

    Core Logic: How to Make Cameras "Understand" People Flow?

    The core idea of this Demo is to let the system "observe - judge - record" like the human eye, which is specifically divided into three steps:

    Step 1: Accurately identify people in the picture

    After the camera captures the frame in real-time, the system will first filter out the target of "people" - through a pre-trained YOLO 11 model, eliminate irrelevant objects such as goods and shelves, and only focus on the human contour. Even if the light is flickering (such as backlight at the store entrance, exhibition hall lighting switching), the system can still stably track pedestrians, ensuring that there will be no missed viewing or misjudgment due to light problems.

    Step 2: Track the movement trajectory of each person

    After identifying the portrait, the system will assign a "temporary ID" to each person and track their movement path in real-time. For example, when someone walks in from the door, the system will mark "enter" and start timing. It is worth mentioning that the tracking algorithm has done ID storage processing. Even if the target is lost, when it reappears in the picture, the system will not count repeatedly, but only continue timing, thus avoiding the disadvantage of traditional infrared sensors that "record once when blocked".

    Step 3: Automatically record the stay time

    When a person enters the monitoring area, the system will automatically start timing until they leave the area. At the same time, it will count data such as "average stay time" and "maximum stay time", which are of great practical value. For example, in retail scenarios, it can show how long customers stay in front of which shelves; in exhibition halls, it can analyze which exhibition areas are the most attractive. The timing logic does not rely on network time, but is accurately calculated based on the frame rate of the picture, ensuring that even if the network is disconnected, it can be accurately recorded.

    Solution Features

    Local processing, faster response

    All calculations (recognition, tracking, timing) are completed on the RV1126B chip and do not need to be transmitted to the cloud. This means lower latency, and there will be no data lag due to network cotton, which is particularly important for stores that need to adjust manpower in real-time.

    Flexible adaptation to different scenarios

    The system has built-in adjustable "confidence parameters": in crowded supermarkets, the recognition threshold can be increased to avoid misjudgment when crowds are crowded; in boutiques with fewer customers, the threshold can be reduced to ensure that every customer entering the store is recorded. At the same time, we can also set "effective areas" (such as only counting people entering the store, ignoring pedestrians passing by the door), which can be adapted to the layout of different venues through simple configuration.

    Two deployment methods, choose as needed

    According to different needs, two implementation paths are provided:

    - If you need to quickly test the effect, you can run it in Python script mode, complete the configuration within a few minutes, which is suitable for makers to quickly verify ideas;

    - If you pursue long-term stable operation, you can cross-compile based on C++ language to generate efficient execution files, reduce power...

    Read more »

  • Main Control Selection - RV1126B RTSP streaming and camera function testing

    Deng MingXi 07/18/2025 at 06:16 0 comments

    When exploring "How to build an AI camera", clarifying its core functional positioning is crucial. AI cameras have a wide range of application scenarios, whether it is real-time monitoring in intelligent security, detail capture in industrial quality inspection, or dynamic recording in home care. The three major functions of RTSP streaming, photo shooting, and video recording are core pillars, directly determining the practical value of AI cameras in different scenarios.

    Currently, many related products and projects face significant pain points in practical applications: RTSP streaming is prone to instability and excessive latency, which greatly affects scenarios requiring real-time feedback (such as security monitoring and industrial assembly line monitoring); when taking photos and recording videos, problems like frame freezes and blurriness occur frequently, seriously impacting the experience for both home users recording life moments and enterprises using them for document shooting and scene preservation. Therefore, a stable RTSP streaming solution combined with smooth photo-taking and video-recording capabilities is a core element in creating an excellent AI camera.

    After comparing multiple products, we chose the RV1126B chip as the core processor for testing. The reasons for selecting it are mainly twofold: 

    First, its RTSP streaming function is stable, supporting both H.264 and H.265 encoding formats, and it can adjust the bitrate according to network conditions and device performance, outputting multiple streams to adapt to different network environments, whether it is a home network with limited bandwidth or a demanding industrial local area network; 

    Second, the RV1126B has excellent ISP image processing capabilities, ensuring the clarity of photos and the smoothness of video recording, providing high-quality output for both long-distance capture in security scenarios and dynamic video recording in home scenarios.

    In conclusion, the powerful encoding/decoding and network streaming capabilities of the RV1126B make it a strong candidate for AI Cameras.

    Below, I will test the three major functions of the camera: taking photos, recording videos, and RTSP streaming. I will examine the clarity, color contrast, and level of detail presentation of the photos taken, and at the same time, I will introduce photos taken by the iPhone 15 at 4K resolution as a benchmark for comparison. For videos, I will focus more on picture smoothness, frame rate stability, and storage costs. For the RTSP part, streaming stability, anti-interference ability, and real-time performance are the points I value.

    Camera Photo Performance

    As an AI Camera, its photo quality is crucial. The original RV1126B is equipped with a photography unit that supports 4K resolution, fully meeting the needs of security and visual inspection. Below is a comparison of images taken by the RV1126B and the iPhone at the same 4K resolution. We can see that the image from the RV1126B has slight edge distortion, and the color contrast is slightly inferior to that of the iPhone. However, this is partly because the iPhone performs automated background processing on the images, enhancing color contrast and correcting distortion. In terms of image clarity, the effect captured by the RV1126B is no less than that of the iPhone.(Because HACKADAY limits the size of the uploaded file, the images are compressed

    It is worth mentioning that on the IP Camera's website, users can adjust parameters such as brightness, contrast, exposure, and backlight compensation to obtain higher-quality images.

    Video Recording

    The quality of video recording is an even more crucial basic indicator for a camera, as it directly affects the accuracy of visual algorithm recognition. A good camera must have excellent clarity and detail restoration capabilities of the recorded frame,smooth images, stable frame rates, and lower storage costs.

    The video...

    Read more »

View all 15 project logs

Enjoy this project?

Share

Discussions

Log In/Sign up to comment

Become a Hackaday.io Member

Create an account to leave a comment. Already have an account? Log In.

Similar Projects

Project Owner Contributor

ShAIdes

nick-bildNick Bild

Does this project spark your interest?

to follow this project and never miss any updates

Cancel

AltStyle によって変換されたページ (->オリジナル) /