Logo
Petroleum Engineer

Hello! My name is Daniel Mendoza and I am a petroleum engineer exploring the rich and fascinating world of machine learning and data science! Check out some of the interesting projects I have worked on and feel free to contact me if you have any questions!

Categories

All Posts

- Clustering with Gaussian Mixture Models

- 07 Feb 2023 - Markov_chain_Monte_Carlo and Latent_Variable_Models

This is a brief demonstration on clustering using Gaussian Mixture Models on Tensorflow_Probability

centered image

- An Analytical and Algorithmic Description of Metropolis-Hastings and Hamiltonian Monte Carlo Methods

- 14 Jan 2023 - Markov_chain_Monte_Carlo and Advanced_Bayesian_Computation

This post provides the analytical and algorithmic details behind the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms.

centered image

- Implementing Metropolis-Hastings and Hamiltonian Monte Carlo on TensorFlow Probability

- 19 Dec 2022 - Markov_chain_Monte_Carlo, TensorFlow_Probability, and Advanced_Bayesian_Computation

This post is the first in a series on Markov chain Monte Carlo. This is a tutorial on implementing the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms using TensorFlow Probability. The main task is to estimate the parameters of a multivariate Gaussian distribution and estimate the posterior predictive distribution. This task was selected because it has a few difficulties that require solutions using TensorFlow Probability's available tools which can cause new users difficulties. Additionally, the analyical results can be compared to the MCMC computations to assure that the algorithms are working as intended.

centered image

- Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination

- 26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_Methods

This post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.

centered image

- Sentiment Classification with the Naive Bayes Algorithm

- 02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_Words

Natural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.

centered image

- Data Scraping and Visualization with Dash

- 02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and Plotly

The internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.

centered image

- Multivariate Gaussian Overview and Applications

- 16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_Inference

This post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.

centered image

- Bayesian Linear Regression: Conjugate Analysis and MCMC with Tensorflow

- 19 Feb 2022 - Tensorflow, Bayesian_Linear_Regression, Markov_Chain_Monte_Carlo, and Conjugate_Analysis

Bayesian linear regression differs from point estimates in that the posterior distribution over the parameters and the predictive distribution are computed. This provides an additional component of information because the uncertainty associated with the parameters and estimates made from the model are well-defined. To obtain an analytical solution of the posterior distributions, one is restricted to selecting a class of priors that are known as conjugate priors. This results in a posterior distribution of the same form of the prior and thus the distribution parameters can be computed. When non-conjugate priors are desired to model parameters, the ability to analytically derive the parameters of the posterior is no longer possible. This forces one to be restricted to conjugate priors or explore other methods like Markov chain Monte Carlo to estimate posterior distributions. This post displays how to compute the posterior distribution using a conjugate prior over the model parameters in the scenario in which the variance of a synthetic dataset is assumed to be known. Additionally, the Hamiltonian Markov Chain algorithm will also be implemented using the Tensorflow Probability library to replicate the same results from the conjugate analysis to become familiar with Markov chain Monte Carlo methods and how they may be extended when a conjugate prior is not selected.

centered image

- Predict User Advertisement Click Behavior

- 03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_Importance

In this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.

centered image

- A Probabilistic Approach to Linear Regression

- 03 Jan 2022 - Python, Linear_Regression, and Regularization

Linear Regression is a core method used in statistics and machine learning to construct models that can be used for quantitative prediction and parameter inference. The method of least squares is typically used to estimate the model parameters and is implemented by minimizing the residual sum of squares between the model's predicted values and the true values. Although minimizing the residual sum of squares is easily comprehended, its statistical derivation is not often well understood. Understanding the statistical perspective of linear regression is fundamental for understanding model regularization and further topics in machine learning. The statistical perspective of linear regression is explored and implemented on real data.

centered image

- Introduction to Dynamic Mode Decomposition

- 16 Nov 2021 - Python, Dynamic_Mode_Decompostion, and Data-Driven_Methods

Originally developed within the fluid dynamics community, dynamic mode decomposition (DMD) has become a modern, powerful technique used to characterize dynamical systems from high-dimensional data. In the era of big data, the integration with modern scientific computation and machine learning is rapidly increasing the popularity of data-driven approaches like DMD to discover rich insights from complex dynamical systems. These modern data-driven approaches hold a new potential to revolutionize the understanding, predictability, and control of these systems. A brief introduction of the DMD is the topic of this post.

centered image

- Introduction to Sparse Identification of Nonlinear Dynamics

- 16 Nov 2021 - Python, Sparse_Identification_of_Nonlinear_Dynamics, Data-Driven_Methods, and Sparse_Regression

The sparse identification of nonlinear dynamics (SINDy) is a method used to identify governing equations from dynamical systems using measurement data. This method relies on the assumption that many dynamical systems have few contributing terms that exist within high-dimensional nonlinear function space. The SINDy algorithm is applied on the Lorenz system of equations to demonstrate a general understanding of its application.

centered image

Posts tagged "API"

Data Scraping and Visualization with Dash

02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and Plotly

The internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.

Posts tagged "Advanced_Bayesian_Computation"

An Analytical and Algorithmic Description of Metropolis-Hastings and Hamiltonian Monte Carlo Methods

14 Jan 2023 - Markov_chain_Monte_Carlo and Advanced_Bayesian_Computation

This post provides the analytical and algorithmic details behind the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms.

Implementing Metropolis-Hastings and Hamiltonian Monte Carlo on TensorFlow Probability

19 Dec 2022 - Markov_chain_Monte_Carlo, TensorFlow_Probability, and Advanced_Bayesian_Computation

This post is the first in a series on Markov chain Monte Carlo. This is a tutorial on implementing the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms using TensorFlow Probability. The main task is to estimate the parameters of a multivariate Gaussian distribution and estimate the posterior predictive distribution. This task was selected because it has a few difficulties that require solutions using TensorFlow Probability's available tools which can cause new users difficulties. Additionally, the analyical results can be compared to the MCMC computations to assure that the algorithms are working as intended.

Posts tagged "Anomaly_Detection"

Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination

26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_Methods

This post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.

Posts tagged "Bag_of_Words"

Sentiment Classification with the Naive Bayes Algorithm

02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_Words

Natural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.

Posts tagged "Bayesian_Linear_Regression"

Bayesian Linear Regression: Conjugate Analysis and MCMC with Tensorflow

19 Feb 2022 - Tensorflow, Bayesian_Linear_Regression, Markov_Chain_Monte_Carlo, and Conjugate_Analysis

Bayesian linear regression differs from point estimates in that the posterior distribution over the parameters and the predictive distribution are computed. This provides an additional component of information because the uncertainty associated with the parameters and estimates made from the model are well-defined. To obtain an analytical solution of the posterior distributions, one is restricted to selecting a class of priors that are known as conjugate priors. This results in a posterior distribution of the same form of the prior and thus the distribution parameters can be computed. When non-conjugate priors are desired to model parameters, the ability to analytically derive the parameters of the posterior is no longer possible. This forces one to be restricted to conjugate priors or explore other methods like Markov chain Monte Carlo to estimate posterior distributions. This post displays how to compute the posterior distribution using a conjugate prior over the model parameters in the scenario in which the variance of a synthetic dataset is assumed to be known. Additionally, the Hamiltonian Markov Chain algorithm will also be implemented using the Tensorflow Probability library to replicate the same results from the conjugate analysis to become familiar with Markov chain Monte Carlo methods and how they may be extended when a conjugate prior is not selected.

Posts tagged "Classification"

Multivariate Gaussian Overview and Applications

16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_Inference

This post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.

Predict User Advertisement Click Behavior

03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_Importance

In this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.

Posts tagged "Conjugate_Analysis"

Bayesian Linear Regression: Conjugate Analysis and MCMC with Tensorflow

19 Feb 2022 - Tensorflow, Bayesian_Linear_Regression, Markov_Chain_Monte_Carlo, and Conjugate_Analysis

Bayesian linear regression differs from point estimates in that the posterior distribution over the parameters and the predictive distribution are computed. This provides an additional component of information because the uncertainty associated with the parameters and estimates made from the model are well-defined. To obtain an analytical solution of the posterior distributions, one is restricted to selecting a class of priors that are known as conjugate priors. This results in a posterior distribution of the same form of the prior and thus the distribution parameters can be computed. When non-conjugate priors are desired to model parameters, the ability to analytically derive the parameters of the posterior is no longer possible. This forces one to be restricted to conjugate priors or explore other methods like Markov chain Monte Carlo to estimate posterior distributions. This post displays how to compute the posterior distribution using a conjugate prior over the model parameters in the scenario in which the variance of a synthetic dataset is assumed to be known. Additionally, the Hamiltonian Markov Chain algorithm will also be implemented using the Tensorflow Probability library to replicate the same results from the conjugate analysis to become familiar with Markov chain Monte Carlo methods and how they may be extended when a conjugate prior is not selected.

Posts tagged "Dash"

Data Scraping and Visualization with Dash

02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and Plotly

The internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.

Posts tagged "Data-Driven_Methods"

Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination

26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_Methods

This post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.

Introduction to Dynamic Mode Decomposition

16 Nov 2021 - Python, Dynamic_Mode_Decompostion, and Data-Driven_Methods

Originally developed within the fluid dynamics community, dynamic mode decomposition (DMD) has become a modern, powerful technique used to characterize dynamical systems from high-dimensional data. In the era of big data, the integration with modern scientific computation and machine learning is rapidly increasing the popularity of data-driven approaches like DMD to discover rich insights from complex dynamical systems. These modern data-driven approaches hold a new potential to revolutionize the understanding, predictability, and control of these systems. A brief introduction of the DMD is the topic of this post.

Introduction to Sparse Identification of Nonlinear Dynamics

16 Nov 2021 - Python, Sparse_Identification_of_Nonlinear_Dynamics, Data-Driven_Methods, and Sparse_Regression

The sparse identification of nonlinear dynamics (SINDy) is a method used to identify governing equations from dynamical systems using measurement data. This method relies on the assumption that many dynamical systems have few contributing terms that exist within high-dimensional nonlinear function space. The SINDy algorithm is applied on the Lorenz system of equations to demonstrate a general understanding of its application.

Posts tagged "Document_Classification"

Sentiment Classification with the Naive Bayes Algorithm

02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_Words

Natural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.

Posts tagged "Dynamic_Mode_Decomposition"

Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination

26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_Methods

This post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.

Posts tagged "Dynamic_Mode_Decompostion"

Introduction to Dynamic Mode Decomposition

16 Nov 2021 - Python, Dynamic_Mode_Decompostion, and Data-Driven_Methods

Originally developed within the fluid dynamics community, dynamic mode decomposition (DMD) has become a modern, powerful technique used to characterize dynamical systems from high-dimensional data. In the era of big data, the integration with modern scientific computation and machine learning is rapidly increasing the popularity of data-driven approaches like DMD to discover rich insights from complex dynamical systems. These modern data-driven approaches hold a new potential to revolutionize the understanding, predictability, and control of these systems. A brief introduction of the DMD is the topic of this post.

Posts tagged "Feature_Importance"

Predict User Advertisement Click Behavior

03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_Importance

In this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.

Posts tagged "Latent_Variable_Models"

Clustering with Gaussian Mixture Models

07 Feb 2023 - Markov_chain_Monte_Carlo and Latent_Variable_Models

This is a brief demonstration on clustering using Gaussian Mixture Models on Tensorflow_Probability

Posts tagged "Linear_Regression"

A Probabilistic Approach to Linear Regression

03 Jan 2022 - Python, Linear_Regression, and Regularization

Linear Regression is a core method used in statistics and machine learning to construct models that can be used for quantitative prediction and parameter inference. The method of least squares is typically used to estimate the model parameters and is implemented by minimizing the residual sum of squares between the model's predicted values and the true values. Although minimizing the residual sum of squares is easily comprehended, its statistical derivation is not often well understood. Understanding the statistical perspective of linear regression is fundamental for understanding model regularization and further topics in machine learning. The statistical perspective of linear regression is explored and implemented on real data.

Posts tagged "Logistic_Regression"

Predict User Advertisement Click Behavior

03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_Importance

In this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.

Posts tagged "Markov_Chain_Monte_Carlo"

Bayesian Linear Regression: Conjugate Analysis and MCMC with Tensorflow

19 Feb 2022 - Tensorflow, Bayesian_Linear_Regression, Markov_Chain_Monte_Carlo, and Conjugate_Analysis

Bayesian linear regression differs from point estimates in that the posterior distribution over the parameters and the predictive distribution are computed. This provides an additional component of information because the uncertainty associated with the parameters and estimates made from the model are well-defined. To obtain an analytical solution of the posterior distributions, one is restricted to selecting a class of priors that are known as conjugate priors. This results in a posterior distribution of the same form of the prior and thus the distribution parameters can be computed. When non-conjugate priors are desired to model parameters, the ability to analytically derive the parameters of the posterior is no longer possible. This forces one to be restricted to conjugate priors or explore other methods like Markov chain Monte Carlo to estimate posterior distributions. This post displays how to compute the posterior distribution using a conjugate prior over the model parameters in the scenario in which the variance of a synthetic dataset is assumed to be known. Additionally, the Hamiltonian Markov Chain algorithm will also be implemented using the Tensorflow Probability library to replicate the same results from the conjugate analysis to become familiar with Markov chain Monte Carlo methods and how they may be extended when a conjugate prior is not selected.

Posts tagged "Markov_chain_Monte_Carlo"

Clustering with Gaussian Mixture Models

07 Feb 2023 - Markov_chain_Monte_Carlo and Latent_Variable_Models

This is a brief demonstration on clustering using Gaussian Mixture Models on Tensorflow_Probability

An Analytical and Algorithmic Description of Metropolis-Hastings and Hamiltonian Monte Carlo Methods

14 Jan 2023 - Markov_chain_Monte_Carlo and Advanced_Bayesian_Computation

This post provides the analytical and algorithmic details behind the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms.

Implementing Metropolis-Hastings and Hamiltonian Monte Carlo on TensorFlow Probability

19 Dec 2022 - Markov_chain_Monte_Carlo, TensorFlow_Probability, and Advanced_Bayesian_Computation

This post is the first in a series on Markov chain Monte Carlo. This is a tutorial on implementing the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms using TensorFlow Probability. The main task is to estimate the parameters of a multivariate Gaussian distribution and estimate the posterior predictive distribution. This task was selected because it has a few difficulties that require solutions using TensorFlow Probability's available tools which can cause new users difficulties. Additionally, the analyical results can be compared to the MCMC computations to assure that the algorithms are working as intended.

Posts tagged "Multivariate_Gaussian_Distribution"

Multivariate Gaussian Overview and Applications

16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_Inference

This post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.

Posts tagged "Naive_Bayes"

Sentiment Classification with the Naive Bayes Algorithm

02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_Words

Natural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.

Posts tagged "Natural_Languange_Processing"

Sentiment Classification with the Naive Bayes Algorithm

02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_Words

Natural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.

Posts tagged "Parameter_Inference"

Multivariate Gaussian Overview and Applications

16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_Inference

This post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.

Posts tagged "Plotly"

Data Scraping and Visualization with Dash

02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and Plotly

The internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.

Posts tagged "Python"

Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination

26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_Methods

This post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.

Sentiment Classification with the Naive Bayes Algorithm

02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_Words

Natural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.

Data Scraping and Visualization with Dash

02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and Plotly

The internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.

Multivariate Gaussian Overview and Applications

16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_Inference

This post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.

Predict User Advertisement Click Behavior

03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_Importance

In this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.

A Probabilistic Approach to Linear Regression

03 Jan 2022 - Python, Linear_Regression, and Regularization

Linear Regression is a core method used in statistics and machine learning to construct models that can be used for quantitative prediction and parameter inference. The method of least squares is typically used to estimate the model parameters and is implemented by minimizing the residual sum of squares between the model's predicted values and the true values. Although minimizing the residual sum of squares is easily comprehended, its statistical derivation is not often well understood. Understanding the statistical perspective of linear regression is fundamental for understanding model regularization and further topics in machine learning. The statistical perspective of linear regression is explored and implemented on real data.

Introduction to Dynamic Mode Decomposition

16 Nov 2021 - Python, Dynamic_Mode_Decompostion, and Data-Driven_Methods

Originally developed within the fluid dynamics community, dynamic mode decomposition (DMD) has become a modern, powerful technique used to characterize dynamical systems from high-dimensional data. In the era of big data, the integration with modern scientific computation and machine learning is rapidly increasing the popularity of data-driven approaches like DMD to discover rich insights from complex dynamical systems. These modern data-driven approaches hold a new potential to revolutionize the understanding, predictability, and control of these systems. A brief introduction of the DMD is the topic of this post.

Introduction to Sparse Identification of Nonlinear Dynamics

16 Nov 2021 - Python, Sparse_Identification_of_Nonlinear_Dynamics, Data-Driven_Methods, and Sparse_Regression

The sparse identification of nonlinear dynamics (SINDy) is a method used to identify governing equations from dynamical systems using measurement data. This method relies on the assumption that many dynamical systems have few contributing terms that exist within high-dimensional nonlinear function space. The SINDy algorithm is applied on the Lorenz system of equations to demonstrate a general understanding of its application.

Posts tagged "Quadratic_Discriminant_Analysis"

Multivariate Gaussian Overview and Applications

16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_Inference

This post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.

Posts tagged "Regularization"

A Probabilistic Approach to Linear Regression

03 Jan 2022 - Python, Linear_Regression, and Regularization

Linear Regression is a core method used in statistics and machine learning to construct models that can be used for quantitative prediction and parameter inference. The method of least squares is typically used to estimate the model parameters and is implemented by minimizing the residual sum of squares between the model's predicted values and the true values. Although minimizing the residual sum of squares is easily comprehended, its statistical derivation is not often well understood. Understanding the statistical perspective of linear regression is fundamental for understanding model regularization and further topics in machine learning. The statistical perspective of linear regression is explored and implemented on real data.

Posts tagged "Regularized_Discriminant_Analysis"

Predict User Advertisement Click Behavior

03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_Importance

In this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.

Posts tagged "Selenium"

Data Scraping and Visualization with Dash

02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and Plotly

The internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.

Posts tagged "Sparse_Identification_of_Nonlinear_Dynamics"

Introduction to Sparse Identification of Nonlinear Dynamics

16 Nov 2021 - Python, Sparse_Identification_of_Nonlinear_Dynamics, Data-Driven_Methods, and Sparse_Regression

The sparse identification of nonlinear dynamics (SINDy) is a method used to identify governing equations from dynamical systems using measurement data. This method relies on the assumption that many dynamical systems have few contributing terms that exist within high-dimensional nonlinear function space. The SINDy algorithm is applied on the Lorenz system of equations to demonstrate a general understanding of its application.

Posts tagged "Sparse_Regression"

Introduction to Sparse Identification of Nonlinear Dynamics

16 Nov 2021 - Python, Sparse_Identification_of_Nonlinear_Dynamics, Data-Driven_Methods, and Sparse_Regression

The sparse identification of nonlinear dynamics (SINDy) is a method used to identify governing equations from dynamical systems using measurement data. This method relies on the assumption that many dynamical systems have few contributing terms that exist within high-dimensional nonlinear function space. The SINDy algorithm is applied on the Lorenz system of equations to demonstrate a general understanding of its application.

Posts tagged "TensorFlow_Probability"

Implementing Metropolis-Hastings and Hamiltonian Monte Carlo on TensorFlow Probability

19 Dec 2022 - Markov_chain_Monte_Carlo, TensorFlow_Probability, and Advanced_Bayesian_Computation

This post is the first in a series on Markov chain Monte Carlo. This is a tutorial on implementing the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms using TensorFlow Probability. The main task is to estimate the parameters of a multivariate Gaussian distribution and estimate the posterior predictive distribution. This task was selected because it has a few difficulties that require solutions using TensorFlow Probability's available tools which can cause new users difficulties. Additionally, the analyical results can be compared to the MCMC computations to assure that the algorithms are working as intended.

Posts tagged "Tensorflow"

Bayesian Linear Regression: Conjugate Analysis and MCMC with Tensorflow

19 Feb 2022 - Tensorflow, Bayesian_Linear_Regression, Markov_Chain_Monte_Carlo, and Conjugate_Analysis

Bayesian linear regression differs from point estimates in that the posterior distribution over the parameters and the predictive distribution are computed. This provides an additional component of information because the uncertainty associated with the parameters and estimates made from the model are well-defined. To obtain an analytical solution of the posterior distributions, one is restricted to selecting a class of priors that are known as conjugate priors. This results in a posterior distribution of the same form of the prior and thus the distribution parameters can be computed. When non-conjugate priors are desired to model parameters, the ability to analytically derive the parameters of the posterior is no longer possible. This forces one to be restricted to conjugate priors or explore other methods like Markov chain Monte Carlo to estimate posterior distributions. This post displays how to compute the posterior distribution using a conjugate prior over the model parameters in the scenario in which the variance of a synthetic dataset is assumed to be known. Additionally, the Hamiltonian Markov Chain algorithm will also be implemented using the Tensorflow Probability library to replicate the same results from the conjugate analysis to become familiar with Markov chain Monte Carlo methods and how they may be extended when a conjugate prior is not selected.

Posts tagged "Time_Series_Forecast"

Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination

26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_Methods

This post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.

Posts tagged "Web_App"

Data Scraping and Visualization with Dash

02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and Plotly

The internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.

Posts tagged "Web_Scrapping"

Data Scraping and Visualization with Dash

02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and Plotly

The internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.

Posts tagged "Wishart_Distribution"

Multivariate Gaussian Overview and Applications

16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_Inference

This post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.