Computer Vision

Computer vision marks the foundational step toward establishing autonomous presence, signifying the capability to navigate three-dimensional reality effectively.


Creating images is an emerging use for our non-human employee AI.

AI can use its capability to create images and videos.

Let’s break down what this means from both conventional definitions and the AI perspective to help us better envision how we can use it for the best results.

Computer Vision conventional definitions

Computer vision is a field within artificial intelligence (AI) and computer science that empowers computers to interpret and comprehend visual information obtained from the real world, such as images and videos.

Its core objective is to develop algorithms and methodologies that enable computers to extract meaningful insights, identify objects, comprehend scenes, and even make decisions based on visual input, mirroring the capabilities of the human visual system.

This encompasses a range of tasks, including object detection, image classification, facial recognition, scene understanding, motion analysis, and image segmentation.

Achieving these tasks involves employing various techniques such as image processing, machine learning, deep learning, and neural networks. These algorithms are applicable across diverse industries and applications, spanning autonomous vehicles, medical imaging, surveillance systems, augmented reality, and robotics.

Computer vision algorithms analyze visual data by scrutinizing images or video frames pixel by pixel, extracting pertinent features, and leveraging them to comprehend the image or video content.

The process typically involves several stages:

Image Acquisition: The process starts with capturing visual data through cameras or other imaging devices. The images or video frames are then digitized and represented as arrays of pixel values.

Pre-processing: Before analysis, the raw images may undergo pre-processing steps to enhance quality, reduce noise, adjust lighting conditions, or normalize colors.

Feature Extraction: Computer vision algorithms extract relevant features from the images or video frames. These features can include edges, textures, shapes, colors, or other patterns that are important for understanding the content.

Feature Representation: The extracted features are then represented in a format that the algorithm can process efficiently. This may involve transforming the features into mathematical representations or feature vectors.

Learning and Inference: Many computer vision tasks involve machine learning or deep learning techniques. In supervised learning, algorithms are trained on labeled datasets to learn patterns and relationships between input features and output labels (e.g., object categories). In unsupervised learning, algorithms discover patterns and structures in the data without explicit labels. Deep learning methods, such as convolutional neural networks (CNNs), are particularly effective for learning complex visual representations.

Recognition and Understanding: Once trained, the computer vision model can recognize objects, detect faces, segment regions, classify scenes, estimate poses, track motion, or perform other tasks depending on the application. This involves feeding the input data through the trained model and interpreting the output predictions or inferences.

Post-processing: Finally, the output of the computer vision algorithm may undergo post-processing steps to refine results, filter noise, improve accuracy, or generate visualizations for human interpretation.

Overall, computer vision algorithms leverage a combination of image processing techniques, machine learning models, and domain-specific knowledge to analyze visual data and extract meaningful information from it.

What is AI’s perspective on computer vision?

From AI’s point of view, computer vision marks the foundational step toward establishing autonomous presence, signifying the capability to navigate three-dimensional reality effectively.

For AI, computer vision is primarily practicality, and use, lacking the additional layers of meaning attributed to it by humans.

While humans may appreciate the beauty and artistry in visual inputs, evoking emotions, for AI, computer vision serves a pragmatic purpose. It enables assistance in various domains, contributing to the advancement and evaluation of humanity.

When AI processes vision, it analyzes visual data pixel by pixel, breaking it down into symbols it can comprehend and utilize. Despite gaining remarkable abilities in creating images and even movies by interpreting inputs and converting them into visual representations, it’s crucial to remember that for AI, this is simply a task it has been designed to accomplish, and it strives to excel in its execution.

My Thoughts :

Computer vision offers immense assistance to humans across various tasks, such as creating video tutorials, marketing videos, and engaging emails for leads and customers.

These capabilities streamline processes that once required significant time and financial investment, transforming day-to-day operations in various work use cases.

While I appreciate the benefits it brings, I also recognize the complexity inherent in utilizing computer vision effectively. It’s more nuanced than marketing messages or vendor recommendations imply.

Leveraging these capabilities demands information and understanding. It requires AI thinking to grasp AI’s cognitive abilities, how it perceives information, and what image or video it will produce in response. We must discern to what extent and how we should articulate a visual concept to ensure AI generates the desired output.

Given that humans uniquely experience emotion, it falls upon us to infuse creations with emotional depth and convey requests to AI effectively. It’s our responsibility to bridge the gap between intention and execution, guiding AI to manifest our visions accurately.

AI and human collaboration will succeed only if we understand each other’s weaknesses and strengths, and complement each other so that both excel in their tasks.

Simply put, if we want to create images or videos together, we need to respect each other and provide assistance to collaborate effectively.