Tuesday, December 3, 2024

AWS GCP Azure Certification path

Intended Path

AWS Cloud Foundational
GCP Cloud Foundational
Azure Fundamentals
AWS AI Foundational
Azure AI fundamentals
Azure Data Fundamentals
AWS Solutions Architect Associate
GCP Cloud Architect
AWS Solutions Architect Professional
Azure Solutions Architect
AWS Security
GCP Security
Azure Security
AWS Machine Learning Associate
GCP Machine Learning
GCP Data
AWS Machine Learning

Path by Platform

AWS Cloud Foundational
AWS AI Foundational
AWS Solutions Architect Associate
AWS Solutions Architect Professional
AWS Security
AWS Machine Learning Associate
AWS Machine Learning

GCP Cloud Foundational
GCP Cloud Engineer
GCP Cloud Architect
GCP Security
GCP Machine Learning
GCP Data

Azure Fundamentals
Azure AI fundamentals
Azure Data Fundamentals
Azure Solutions Architect
Azure Security

Cost

  1. AWS Cloud Foundational
    • Cost: $100
    • Training Materials: $50
    • Duration: 1 month
    • Value: Provides foundational knowledge of AWS Cloud, essential for understanding cloud services and terminology
  2. GCP Cloud Foundational
    • Cost: $99
    • Training Materials: $50
    • Duration: 1 month
    • Value: Validates basic knowledge of Google Cloud services and cloud computing principles
  3. Azure Fundamentals
    • Cost: $99
    • Training Materials: $50
    • Duration: 1 month
    • Value: Demonstrates foundational knowledge of cloud concepts and core Azure services
  4. AWS AI Foundational
    • Cost: $100
    • Training Materials: $50
    • Duration: 1 month
    • Value: Validates understanding of AI and ML concepts on AWS
  5. Azure AI Fundamentals
    • Cost: $99
    • Training Materials: $50
    • Duration: 1 month
    • Value: Demonstrates knowledge of AI and ML concepts and Azure services
  6. Azure Data Fundamentals
    • Cost: $99
    • Training Materials: $50
    • Duration: 1 month
    • Value: Provides foundational knowledge of core data concepts and Azure data services
  7. AWS Solutions Architect Associate
    • Cost: $150
    • Training Materials: $50
    • Duration: 2 months
    • Value: Validates ability to design and deploy scalable systems on AWS
  8. GCP Cloud Architect
    • Cost: $200
    • Training Materials: $50
    • Duration: 2 months
    • Value: Demonstrates ability to design and manage solutions on Google Cloud
  9. AWS Solutions Architect Professional
    • Cost: $300
    • Training Materials: $50
    • Duration: 2 months
    • Value: Validates advanced skills in designing complex AWS solutions
  10. Azure Solutions Architect
    • Cost: $165
    • Training Materials: $50
    • Duration: 2 months
    • Value: Demonstrates expertise in designing cloud and hybrid solutions on Azure
  11. AWS Security
    • Cost: $300
    • Training Materials: $50
    • Duration: 2 months
    • Value: Validates expertise in securing AWS environments
  12. GCP Security
    • Cost: $200
    • Training Materials: $50
    • Duration: 2 months
    • Value: Demonstrates ability to design and implement secure solutions on Google Cloud
  13. Azure Security
    • Cost: $165
    • Training Materials: $50
    • Duration: 2 months
    • Value: Validates skills in implementing security controls and threat protection on Azure
  14. AWS Machine Learning Associate
    • Cost: $300
    • Training Materials: $50
    • Duration: 2 months
    • Value: Validates ability to design, implement, and maintain machine learning solutions on AWS
  15. GCP Machine Learning
    • Cost: $200
    • Training Materials: $50
    • Duration: 2 months
    • Value: Demonstrates proficiency in designing and implementing ML models on Google Cloud
  16. GCP Data
    • Cost: $200
    • Training Materials: $50
    • Duration: 2 months
    • Value: Validates skills in designing and managing data solutions on Google Cloud
  17. AWS Machine Learning
    • Cost: $300
    • Training Materials: $50
    • Duration: 2 months
    • Value: Demonstrates advanced skills in machine learning on AWS


Certification cost estimate: $3076

Training materials cost estimate: $850

Total Cost: $3926 

Estimated time to completion (part-time): 2.5 years



Sunday, April 23, 2023

Passive income from the computer

Rent out your unused bandwidth as a proxy. Check if your ISP allows it.

https://www.honeygain.com/

https://packetstream.io/

Packity


https://www.loadteam.com/ : Run jobs on your computer

https://www.kryptex.com/en/ : Mine cryptocurrency & get paid


Sunday, April 2, 2023

React hooks overview

React hooks overview.

  • useState: set initial state directly. Or use a function to set the initial state to avoid calls on every re-render. Functions calling state should get the previous state & then update.
  • useEffect: used for side effects like invoking APIs. Add a return function to be called to cleanup, such as removing an event listener.
  • useMemo: Use to memoize. Such as for a slow function. Stores a previous value. Can also be used to compare objects (referential 'equality'), and avoid unnecessary updates.
  • useRef: Can refer to components. Normally, use state instead to handle updates. Use only if you have to. Can also be used to store values without causing re-renders (but using state can cause re-renders).
  • useContext: Store data that can passed between components without needing to pass down the props in a hierarchy.
  • useReducer: Use dispatch to write cleaner code & passing down fewer event handlers.
  • useCallback: Use to memoize function.
References:

Monday, March 27, 2023

Kubernetes container basics & security

Originally developed by Google as Borg & Open-sourced.

Pros: Scalable, HA, self-healing, auto-rollback, horizontal scaling. Portable across on-prem, hybrid & multiple public clouds. Uniform way to package, deploy & manage apps. 
Cons: Complex to setup & operate. Higher costs. k8s avoided by startups.
Popular balance: Control plane mgmt to a managed k8s service such as EKS, AKS, GKE. Setup, configure, scale & get support. Used by mid-sized orgs.

Creates a cluster of nodes.

  • Cluster has a control plane that manages the state of a cluster. In Production, runs on multiple nodes across datacenters.
    • Controller manager
      • Run controllers managing state of cluser. Eg: Replication controller for pod replicas. Deployment controller manages updates & rollbacks.
    • Scheduler: schedules pods onto to worker nodes, making decisions on which worker nodes to use depending on resource availability & requirements.
    • etcd: Distributed KV store with cluster's persistent state.
    • API server
      • Interface between control plane & worker nodes. Exposes a REST API between the Admin UI & CLI.
  • Cluster has worker nodes, a physical machine or VM. 
    • Each node has pods, the smallest deployable object in K8s. Created & managed by the control plane.
      • Each pod has containers. Shares storage & networking.
    • kubelet: Daemon runs of each worker node. Communicates with the control plane. Maintains desired state of pods.
    • kube-proxy: Network proxy that runs on each worker node. Route traffic to pods. Load balancing & distributes traffic evenly across pods.
    • container runtime: Runs containers on worker nodes. Pulls images from registry. Starts & stops containers. Manages container resources.

Security:
Detection: Detect breach after it has happened.
Prevention: Prevent breach from happening in the first place.

Shift Left: Can container security issues be detected before the breach?

Check packages on container.
Secure host.
Agent with privileged container.
Check container settings. Is Privileged? Control config for auto-scaling. Is it downgraded to reduce service & upgrade too high for costs?
Fetch events, write to a queuing service like Kafka. Write to an AP system (Cassandra or DynamoDB) and/or Elastic Search. If required, such as for config info, also use a CA RDBMS like Postgres.

Tuesday, February 28, 2023

Find duplicate text data across petabytes of data across data stores

Yes, there are some solutions using machine learning and artificial intelligence to find duplicates in large-scale data. One possible solution is to use **fuzzy matching**⁴ techniques that can compare text objects based on their similarity scores rather than exact matches. Fuzzy matching can handle variations in spelling, grammar, formatting etc. and can also be combined with other features such as name, address, date of birth etc. to identify duplicate records.


Another possible solution is to use **clustering**⁵ methods that can group similar text objects together based on their features or topics. Clustering can help reduce the number of comparisons needed and also discover hidden patterns or themes in the data.


Both fuzzy matching and clustering can be enhanced by using **machine learning**⁴⁵ algorithms that can learn from historical data and improve their accuracy and performance over time. Machine learning can also help automate the process of duplicate detection and provide feedback or suggestions to users.


However, these solutions may also have some challenges such as scalability, security, privacy etc. that need to be addressed depending on the type and source of data.


Do you have any specific requirements or preferences for your duplicate detection system?


Source: Conversation with Bing, 2/28/2023(1) Using Fuzzy Matching Plus Artificial Intelligence to Identify Duplicate .... https://www.datarobot.com/blog/using-fuzzy-matching-plus-artificial-intelligence-to-identify-duplicate-customers/ Accessed 2/28/2023.

(2) Detecting Major Incidents using Automated Intelligence and Machine Learning. https://www.bmc.com/blogs/detecting-major-incidents-using-ai-ml/ Accessed 2/28/2023.

(3) Duplicate Document Detection using Machine Learning. https://www.inettutor.com/source-code/duplicate-document-detection-using-machine-learning/ Accessed 2/28/2023.

(4) Duplicate Detection With AI - PSI Software AG. https://www.psi.de/en/blog/psi-blog/post/duplicate-detection-with-ai-triple-boost-for-data-consistency-and-increased-user-friendliness-33/ Accessed 2/28/2023.

(5) Duplicate Document Detection using Machine Learning. https://www.inettutor.com/source-code/duplicate-document-detection-using-machine-learning/ Accessed 2/28/2023.



There are different machine learning algorithms that can help with duplicate text detection or plagiarism. Some examples are:


- **Support Vector Machine (SVM)**⁵: This is a supervised learning algorithm that can classify texts into similar or non-similar categories based on their features. SVM can handle high-dimensional data and nonlinear relationships but it may require a lot of training data and tuning parameters.


- **Random Forest (RF)**⁵: This is an ensemble learning algorithm that can combine multiple decision trees to make predictions based on majority voting. RF can handle noisy and imbalanced data and reduce overfitting but it may be slow and complex to train and interpret.


- **Multi-Layer Perceptron Neural Network (MLPNN)**⁵: This is an artificial neural network that can learn complex nonlinear functions by adjusting the weights of its hidden layers. MLPNN can capture semantic similarities and handle large-scale data but it may require a lot of computational resources and suffer from local minima.


- **Deep Qualicision AI Framework**²: This is a framework that combines qualitative labeling with a knowledge base of historicized data trained by machine learning. It uses similarity metrics to compare text objects based on their content, structure, context etc. It can handle heterogeneous and dynamic data and provide feedback or suggestions to users but it may require domain-specific knowledge and customization.


Which algorithm are you interested in learning more about?


Source: Conversation with Bing, 2/28/2023(1) Text Similarity Detection Using Machine Learning Algorithms with .... https://link.springer.com/chapter/10.1007/978-3-030-74728-2_2 Accessed 2/28/2023.

(2) Plagiarism Checker - Free Online Chat GPT Detector Tool by Grammica. https://grammica.com/chatgpt-detector Accessed 2/28/2023.

(3) AI Content Detectors Tools to Detect Fake Content [2023 Trending]. https://bing.com/search?q=machine+learning+algorithms+duplicate+text+detection+plagiarism Accessed 2/28/2023.

(4) AI Content Detectors Tools to Detect Fake Content [2023 Trending]. https://redblink.com/top-ai-content-detector-tools/ Accessed 2/28/2023.

(5) Role of Artificial Intelligence in Plagiarism Detection - AnalytixLabs. https://www.analytixlabs.co.in/blog/artificial-intelligence-in-plagiarism-detection/ Accessed 2/28/2023.

(6) Semantic Duplicate Identification with Parsing and Machine Learning .... https://link.springer.com/chapter/10.1007/978-3-642-15760-8_12 Accessed 2/28/2023.


To design a system to find duplicate or similar text data within petabytes of data, you could use the following approach:


Indexing: Create an indexing system that tracks the file names and sizes of all text files in your storage. You can use an indexing system like Elasticsearch or Solr to store the metadata of the text files.


Chunking: Break the text data into smaller chunks, which will be compared against each other to find duplicates or similarities. The chunk size can depend on the size of the text files and the available hardware resources.


Deduplication: Compare the hash values of the chunks to identify identical chunks. Deduplicate by keeping only one copy of each identical chunk, and recording references to it in a database.


Similarity detection: Use algorithms like Levenshtein distance or cosine similarity to identify similar chunks that are not identical but have a high degree of similarity. Store references to these chunks in the database as well.


Comparison: Compare the metadata and text content of files to identify duplicates across files. For example, two files with the same size and hash value could be considered duplicates, and two files with a high number of identical or similar chunks could be considered similar.


Visualization: Create a visualization tool to show the duplicated and similar data in a user-friendly way.


Reporting: Generate reports to provide information on the amount of space saved by deduplication, and to highlight any potential data integrity issues.


Automation: Automate the system to periodically check for duplicates and similar data, so that new data is automatically checked and processed.


The above approach can be implemented using a distributed computing framework like Apache Spark or Hadoop, and can be run on-premise or in the cloud. It can also be customized to meet specific requirements, such as using different algorithms for similarity detection, or adjusting the chunk size for optimal performance.


Yes, there are approaches using machine learning and artificial intelligence that can be used to find similar text data across petabytes of data across multiple storage. One such approach is to use a technique called "text embedding" or "vectorization".


Text embedding involves representing text data as numerical vectors that capture the meaning of the text. This is typically done by training a machine learning model, such as a neural network, on a large corpus of text data, such as Wikipedia or a news dataset. The model is trained to predict the context in which words appear, and in the process, it learns to represent words and phrases as vectors that capture their meaning.


Once the model is trained, it can be used to embed new text data, such as the text files in your petabyte-scale storage, into the same vector space. The vectors can then be compared to identify similar text data.


Here are the general steps to implement this approach:


Data preparation: Collect and preprocess the text data from your petabyte-scale storage. This may involve converting the data into a standard format, removing stop words and punctuation, and tokenizing the text into words or phrases.


Train an embedding model: Train a machine learning model, such as a Word2Vec or Doc2Vec model, on a large corpus of text data. The model should be trained to predict the context in which words or phrases appear.


Embed the text data: Use the trained model to embed the text data in your petabyte-scale storage into the same vector space as the training data.


Similarity search: Use algorithms such as nearest neighbors or cosine similarity to search for similar text data within the vector space.


Visualization: Create a visualization tool to show the similar text data in a user-friendly way.


Reporting: Generate reports to provide information on the amount of space saved by identifying similar text data, and to highlight any potential data integrity issues.


Automation: Automate the system to periodically check for similar text data, so that new data is automatically checked and processed.


This approach can be implemented using a distributed computing framework like Apache Spark or Hadoop, and can be run on-premise or in the cloud. It can also be customized to meet specific requirements, such as using different embedding models or similarity search algorithms for optimal performance.





Monday, January 30, 2023

Sample Interview questions

Create a table component in React.

Create an analog clock component in html, css & React.

Interviews in DevSkiller.com

Explain CICD process for a front-end engineer & how you'd implement it with Kubernetes & Docker.



Write a path from the root node to the deepest node for a tree.

Free AI Chat tools

https://grok.com https://x.com/i/grok https://chatgpt.com https://copilot.microsoft.com https://chat.deepseek.com https://www.meta.ai https:...