Tuesday, December 3, 2024

AWS GCP Azure Certification path

Intended Path

AWS Cloud Foundational

GCP Cloud Foundational

Azure Fundamentals

AWS AI Foundational

Azure AI fundamentals

Azure Data Fundamentals

AWS Solutions Architect Associate

GCP Cloud Architect

AWS Solutions Architect Professional

Azure Solutions Architect

AWS Security

GCP Security

Azure Security

AWS Machine Learning Associate

GCP Machine Learning

GCP Data

AWS Machine Learning

Path by Platform

AWS Cloud Foundational

AWS AI Foundational

AWS Solutions Architect Associate

AWS Solutions Architect Professional

AWS Security

AWS Machine Learning Associate

AWS Machine Learning

GCP Cloud Foundational

GCP Cloud Engineer

GCP Cloud Architect

GCP Security

GCP Machine Learning

GCP Data

Azure Fundamentals

Azure AI fundamentals

Azure Data Fundamentals

Azure Solutions Architect

Azure Security

Cost

AWS Cloud Foundational

Cost: $100
Training Materials: $50
Duration: 1 month
Value: Provides foundational knowledge of AWS Cloud, essential for understanding cloud services and terminology

GCP Cloud Foundational

Cost: $99
Training Materials: $50
Duration: 1 month
Value: Validates basic knowledge of Google Cloud services and cloud computing principles

Azure Fundamentals

Cost: $99
Training Materials: $50
Duration: 1 month
Value: Demonstrates foundational knowledge of cloud concepts and core Azure services

AWS AI Foundational

Cost: $100
Training Materials: $50
Duration: 1 month
Value: Validates understanding of AI and ML concepts on AWS

Azure AI Fundamentals

Cost: $99
Training Materials: $50
Duration: 1 month
Value: Demonstrates knowledge of AI and ML concepts and Azure services

Azure Data Fundamentals

Cost: $99
Training Materials: $50
Duration: 1 month
Value: Provides foundational knowledge of core data concepts and Azure data services

AWS Solutions Architect Associate

Cost: $150
Training Materials: $50
Duration: 2 months
Value: Validates ability to design and deploy scalable systems on AWS

GCP Cloud Architect

Cost: $200
Training Materials: $50
Duration: 2 months
Value: Demonstrates ability to design and manage solutions on Google Cloud

AWS Solutions Architect Professional

Cost: $300
Training Materials: $50
Duration: 2 months
Value: Validates advanced skills in designing complex AWS solutions

Azure Solutions Architect

Cost: $165
Training Materials: $50
Duration: 2 months
Value: Demonstrates expertise in designing cloud and hybrid solutions on Azure

AWS Security

Cost: $300
Training Materials: $50
Duration: 2 months
Value: Validates expertise in securing AWS environments

GCP Security

Cost: $200
Training Materials: $50
Duration: 2 months
Value: Demonstrates ability to design and implement secure solutions on Google Cloud

Azure Security

Cost: $165
Training Materials: $50
Duration: 2 months
Value: Validates skills in implementing security controls and threat protection on Azure

AWS Machine Learning Associate

Cost: $300
Training Materials: $50
Duration: 2 months
Value: Validates ability to design, implement, and maintain machine learning solutions on AWS

GCP Machine Learning

Cost: $200
Training Materials: $50
Duration: 2 months
Value: Demonstrates proficiency in designing and implementing ML models on Google Cloud

GCP Data

Cost: $200
Training Materials: $50
Duration: 2 months
Value: Validates skills in designing and managing data solutions on Google Cloud

AWS Machine Learning

Cost: $300
Training Materials: $50
Duration: 2 months
Value: Demonstrates advanced skills in machine learning on AWS

Certification cost estimate: $3076

Training materials cost estimate: $850

Total Cost: $3926

Estimated time to completion (part-time): 2.5 years

Sunday, April 23, 2023

Passive income from the computer

Rent out your unused bandwidth as a proxy. Check if your ISP allows it.

https://www.honeygain.com/

https://packetstream.io/

Packity

https://www.loadteam.com/ : Run jobs on your computer

https://www.kryptex.com/en/ : Mine cryptocurrency & get paid

Sunday, April 2, 2023

React hooks overview

React hooks overview.

useState: set initial state directly. Or use a function to set the initial state to avoid calls on every re-render. Functions calling state should get the previous state & then update.
useEffect: used for side effects like invoking APIs. Add a return function to be called to cleanup, such as removing an event listener.
useMemo: Use to memoize. Such as for a slow function. Stores a previous value. Can also be used to compare objects (referential 'equality'), and avoid unnecessary updates.
useRef: Can refer to components. Normally, use state instead to handle updates. Use only if you have to. Can also be used to store values without causing re-renders (but using state can cause re-renders).
useContext: Store data that can passed between components without needing to pass down the props in a hierarchy.
useReducer: Use dispatch to write cleaner code & passing down fewer event handlers.
useCallback: Use to memoize function.

References:

https://youtu.be/O6P86uwfdR0

Monday, March 27, 2023

Kubernetes container basics & security

Originally developed by Google as Borg & Open-sourced.

Pros: Scalable, HA, self-healing, auto-rollback, horizontal scaling. Portable across on-prem, hybrid & multiple public clouds. Uniform way to package, deploy & manage apps.

Cons: Complex to setup & operate. Higher costs. k8s avoided by startups.

Popular balance: Control plane mgmt to a managed k8s service such as EKS, AKS, GKE. Setup, configure, scale & get support. Used by mid-sized orgs.

Creates a cluster of nodes.

Cluster has a control plane that manages the state of a cluster. In Production, runs on multiple nodes across datacenters.

Controller manager

Run controllers managing state of cluser. Eg: Replication controller for pod replicas. Deployment controller manages updates & rollbacks.

Scheduler: schedules pods onto to worker nodes, making decisions on which worker nodes to use depending on resource availability & requirements.
etcd: Distributed KV store with cluster's persistent state.
API server

Interface between control plane & worker nodes. Exposes a REST API between the Admin UI & CLI.

Cluster has worker nodes, a physical machine or VM.

Each node has pods, the smallest deployable object in K8s. Created & managed by the control plane.

Each pod has containers. Shares storage & networking.

kubelet: Daemon runs of each worker node. Communicates with the control plane. Maintains desired state of pods.
kube-proxy: Network proxy that runs on each worker node. Route traffic to pods. Load balancing & distributes traffic evenly across pods.
container runtime: Runs containers on worker nodes. Pulls images from registry. Starts & stops containers. Manages container resources.

Security:

Detection: Detect breach after it has happened.

Prevention: Prevent breach from happening in the first place.

Shift Left: Can container security issues be detected before the breach?

Check packages on container.

Secure host.

Agent with privileged container.

Check container settings. Is Privileged? Control config for auto-scaling. Is it downgraded to reduce service & upgrade too high for costs?

Fetch events, write to a queuing service like Kafka. Write to an AP system (Cassandra or DynamoDB) and/or Elastic Search. If required, such as for config info, also use a CA RDBMS like Postgres.

Tuesday, February 28, 2023

Find duplicate text data across petabytes of data across data stores

Yes, there are some solutions using machine learning and artificial intelligence to find duplicates in large-scale data. One possible solution is to use **fuzzy matching**⁴ techniques that can compare text objects based on their similarity scores rather than exact matches. Fuzzy matching can handle variations in spelling, grammar, formatting etc. and can also be combined with other features such as name, address, date of birth etc. to identify duplicate records.

Another possible solution is to use **clustering**⁵ methods that can group similar text objects together based on their features or topics. Clustering can help reduce the number of comparisons needed and also discover hidden patterns or themes in the data.

Both fuzzy matching and clustering can be enhanced by using **machine learning**⁴⁵ algorithms that can learn from historical data and improve their accuracy and performance over time. Machine learning can also help automate the process of duplicate detection and provide feedback or suggestions to users.

However, these solutions may also have some challenges such as scalability, security, privacy etc. that need to be addressed depending on the type and source of data.

Do you have any specific requirements or preferences for your duplicate detection system?

Source: Conversation with Bing, 2/28/2023(1) Using Fuzzy Matching Plus Artificial Intelligence to Identify Duplicate .... https://www.datarobot.com/blog/using-fuzzy-matching-plus-artificial-intelligence-to-identify-duplicate-customers/ Accessed 2/28/2023.

(2) Detecting Major Incidents using Automated Intelligence and Machine Learning. https://www.bmc.com/blogs/detecting-major-incidents-using-ai-ml/ Accessed 2/28/2023.

(3) Duplicate Document Detection using Machine Learning. https://www.inettutor.com/source-code/duplicate-document-detection-using-machine-learning/ Accessed 2/28/2023.

(4) Duplicate Detection With AI - PSI Software AG. https://www.psi.de/en/blog/psi-blog/post/duplicate-detection-with-ai-triple-boost-for-data-consistency-and-increased-user-friendliness-33/ Accessed 2/28/2023.

(5) Duplicate Document Detection using Machine Learning. https://www.inettutor.com/source-code/duplicate-document-detection-using-machine-learning/ Accessed 2/28/2023.

There are different machine learning algorithms that can help with duplicate text detection or plagiarism. Some examples are:

- **Support Vector Machine (SVM)**⁵: This is a supervised learning algorithm that can classify texts into similar or non-similar categories based on their features. SVM can handle high-dimensional data and nonlinear relationships but it may require a lot of training data and tuning parameters.

- **Random Forest (RF)**⁵: This is an ensemble learning algorithm that can combine multiple decision trees to make predictions based on majority voting. RF can handle noisy and imbalanced data and reduce overfitting but it may be slow and complex to train and interpret.

- **Multi-Layer Perceptron Neural Network (MLPNN)**⁵: This is an artificial neural network that can learn complex nonlinear functions by adjusting the weights of its hidden layers. MLPNN can capture semantic similarities and handle large-scale data but it may require a lot of computational resources and suffer from local minima.

- **Deep Qualicision AI Framework**²: This is a framework that combines qualitative labeling with a knowledge base of historicized data trained by machine learning. It uses similarity metrics to compare text objects based on their content, structure, context etc. It can handle heterogeneous and dynamic data and provide feedback or suggestions to users but it may require domain-specific knowledge and customization.

Which algorithm are you interested in learning more about?

Source: Conversation with Bing, 2/28/2023(1) Text Similarity Detection Using Machine Learning Algorithms with .... https://link.springer.com/chapter/10.1007/978-3-030-74728-2_2 Accessed 2/28/2023.

(2) Plagiarism Checker - Free Online Chat GPT Detector Tool by Grammica. https://grammica.com/chatgpt-detector Accessed 2/28/2023.

(3) AI Content Detectors Tools to Detect Fake Content [2023 Trending]. https://bing.com/search?q=machine+learning+algorithms+duplicate+text+detection+plagiarism Accessed 2/28/2023.

(4) AI Content Detectors Tools to Detect Fake Content [2023 Trending]. https://redblink.com/top-ai-content-detector-tools/ Accessed 2/28/2023.

(5) Role of Artificial Intelligence in Plagiarism Detection - AnalytixLabs. https://www.analytixlabs.co.in/blog/artificial-intelligence-in-plagiarism-detection/ Accessed 2/28/2023.

(6) Semantic Duplicate Identification with Parsing and Machine Learning .... https://link.springer.com/chapter/10.1007/978-3-642-15760-8_12 Accessed 2/28/2023.

To design a system to find duplicate or similar text data within petabytes of data, you could use the following approach:

Indexing: Create an indexing system that tracks the file names and sizes of all text files in your storage. You can use an indexing system like Elasticsearch or Solr to store the metadata of the text files.

Chunking: Break the text data into smaller chunks, which will be compared against each other to find duplicates or similarities. The chunk size can depend on the size of the text files and the available hardware resources.

Deduplication: Compare the hash values of the chunks to identify identical chunks. Deduplicate by keeping only one copy of each identical chunk, and recording references to it in a database.

Similarity detection: Use algorithms like Levenshtein distance or cosine similarity to identify similar chunks that are not identical but have a high degree of similarity. Store references to these chunks in the database as well.

Comparison: Compare the metadata and text content of files to identify duplicates across files. For example, two files with the same size and hash value could be considered duplicates, and two files with a high number of identical or similar chunks could be considered similar.

Visualization: Create a visualization tool to show the duplicated and similar data in a user-friendly way.

Reporting: Generate reports to provide information on the amount of space saved by deduplication, and to highlight any potential data integrity issues.

Automation: Automate the system to periodically check for duplicates and similar data, so that new data is automatically checked and processed.

The above approach can be implemented using a distributed computing framework like Apache Spark or Hadoop, and can be run on-premise or in the cloud. It can also be customized to meet specific requirements, such as using different algorithms for similarity detection, or adjusting the chunk size for optimal performance.

Yes, there are approaches using machine learning and artificial intelligence that can be used to find similar text data across petabytes of data across multiple storage. One such approach is to use a technique called "text embedding" or "vectorization".

Text embedding involves representing text data as numerical vectors that capture the meaning of the text. This is typically done by training a machine learning model, such as a neural network, on a large corpus of text data, such as Wikipedia or a news dataset. The model is trained to predict the context in which words appear, and in the process, it learns to represent words and phrases as vectors that capture their meaning.

Once the model is trained, it can be used to embed new text data, such as the text files in your petabyte-scale storage, into the same vector space. The vectors can then be compared to identify similar text data.

Here are the general steps to implement this approach:

Data preparation: Collect and preprocess the text data from your petabyte-scale storage. This may involve converting the data into a standard format, removing stop words and punctuation, and tokenizing the text into words or phrases.

Train an embedding model: Train a machine learning model, such as a Word2Vec or Doc2Vec model, on a large corpus of text data. The model should be trained to predict the context in which words or phrases appear.

Embed the text data: Use the trained model to embed the text data in your petabyte-scale storage into the same vector space as the training data.

Similarity search: Use algorithms such as nearest neighbors or cosine similarity to search for similar text data within the vector space.

Visualization: Create a visualization tool to show the similar text data in a user-friendly way.

Reporting: Generate reports to provide information on the amount of space saved by identifying similar text data, and to highlight any potential data integrity issues.

Automation: Automate the system to periodically check for similar text data, so that new data is automatically checked and processed.

This approach can be implemented using a distributed computing framework like Apache Spark or Hadoop, and can be run on-premise or in the cloud. It can also be customized to meet specific requirements, such as using different embedding models or similarity search algorithms for optimal performance.

Monday, January 30, 2023

Sample Interview questions

Create a table component in React.

Create an analog clock component in html, css & React.

Interviews in DevSkiller.com

Explain CICD process for a front-end engineer & how you'd implement it with Kubernetes & Docker.

Write a path from the root node to the deepest node for a tree.

Software design & coding-notes

Thursday, February 27, 2025

Free AI Chat tools