Hello! My name is Daniel Mendoza and I am a petroleum engineer exploring the rich and fascinating world of machine learning and data science! Check out some of the interesting projects I have worked on and feel free to contact me if you have any questions!
Categories
All Posts API Advanced_Bayesian_Computation Anomaly_Detection Bag_of_Words Bayesian_Linear_Regression Classification Conjugate_Analysis Dash Data-Driven_Methods Document_Classification Dynamic_Mode_Decomposition Dynamic_Mode_Decompostion Feature_Importance Latent_Variable_Models Linear_Regression Logistic_Regression Markov_Chain_Monte_Carlo Markov_chain_Monte_Carlo Multivariate_Gaussian_Distribution Naive_Bayes Natural_Languange_Processing Parameter_Inference Plotly Python Quadratic_Discriminant_Analysis Regularization Regularized_Discriminant_Analysis Selenium Sparse_Identification_of_Nonlinear_Dynamics Sparse_Regression TensorFlow_Probability Tensorflow Time_Series_Forecast Web_App Web_Scrapping Wishart_DistributionAll Posts
- Clustering with Gaussian Mixture Models
- 07 Feb 2023 - Markov_chain_Monte_Carlo and Latent_Variable_ModelsThis is a brief demonstration on clustering using Gaussian Mixture Models on Tensorflow_Probability

- An Analytical and Algorithmic Description of Metropolis-Hastings and Hamiltonian Monte Carlo Methods
- 14 Jan 2023 - Markov_chain_Monte_Carlo and Advanced_Bayesian_ComputationThis post provides the analytical and algorithmic details behind the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms.

- Implementing Metropolis-Hastings and Hamiltonian Monte Carlo on TensorFlow Probability
- 19 Dec 2022 - Markov_chain_Monte_Carlo, TensorFlow_Probability, and Advanced_Bayesian_ComputationThis post is the first in a series on Markov chain Monte Carlo. This is a tutorial on implementing the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms using TensorFlow Probability. The main task is to estimate the parameters of a multivariate Gaussian distribution and estimate the posterior predictive distribution. This task was selected because it has a few difficulties that require solutions using TensorFlow Probability's available tools which can cause new users difficulties. Additionally, the analyical results can be compared to the MCMC computations to assure that the algorithms are working as intended.

- Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination
- 26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_MethodsThis post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.

- Sentiment Classification with the Naive Bayes Algorithm
- 02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_WordsNatural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.

- Data Scraping and Visualization with Dash
- 02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and PlotlyThe internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.

- Multivariate Gaussian Overview and Applications
- 16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_InferenceThis post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.

- Bayesian Linear Regression: Conjugate Analysis and MCMC with Tensorflow
- 19 Feb 2022 - Tensorflow, Bayesian_Linear_Regression, Markov_Chain_Monte_Carlo, and Conjugate_AnalysisBayesian linear regression differs from point estimates in that the posterior distribution over the parameters and the predictive distribution are computed. This provides an additional component of information because the uncertainty associated with the parameters and estimates made from the model are well-defined. To obtain an analytical solution of the posterior distributions, one is restricted to selecting a class of priors that are known as conjugate priors. This results in a posterior distribution of the same form of the prior and thus the distribution parameters can be computed. When non-conjugate priors are desired to model parameters, the ability to analytically derive the parameters of the posterior is no longer possible. This forces one to be restricted to conjugate priors or explore other methods like Markov chain Monte Carlo to estimate posterior distributions. This post displays how to compute the posterior distribution using a conjugate prior over the model parameters in the scenario in which the variance of a synthetic dataset is assumed to be known. Additionally, the Hamiltonian Markov Chain algorithm will also be implemented using the Tensorflow Probability library to replicate the same results from the conjugate analysis to become familiar with Markov chain Monte Carlo methods and how they may be extended when a conjugate prior is not selected.

- Predict User Advertisement Click Behavior
- 03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_ImportanceIn this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.

- A Probabilistic Approach to Linear Regression
- 03 Jan 2022 - Python, Linear_Regression, and RegularizationLinear Regression is a core method used in statistics and machine learning to construct models that can be used for quantitative prediction and parameter inference. The method of least squares is typically used to estimate the model parameters and is implemented by minimizing the residual sum of squares between the model's predicted values and the true values. Although minimizing the residual sum of squares is easily comprehended, its statistical derivation is not often well understood. Understanding the statistical perspective of linear regression is fundamental for understanding model regularization and further topics in machine learning. The statistical perspective of linear regression is explored and implemented on real data.

- Introduction to Dynamic Mode Decomposition
- 16 Nov 2021 - Python, Dynamic_Mode_Decompostion, and Data-Driven_MethodsOriginally developed within the fluid dynamics community, dynamic mode decomposition (DMD) has become a modern, powerful technique used to characterize dynamical systems from high-dimensional data. In the era of big data, the integration with modern scientific computation and machine learning is rapidly increasing the popularity of data-driven approaches like DMD to discover rich insights from complex dynamical systems. These modern data-driven approaches hold a new potential to revolutionize the understanding, predictability, and control of these systems. A brief introduction of the DMD is the topic of this post.

- Introduction to Sparse Identification of Nonlinear Dynamics
- 16 Nov 2021 - Python, Sparse_Identification_of_Nonlinear_Dynamics, Data-Driven_Methods, and Sparse_RegressionThe sparse identification of nonlinear dynamics (SINDy) is a method used to identify governing equations from dynamical systems using measurement data. This method relies on the assumption that many dynamical systems have few contributing terms that exist within high-dimensional nonlinear function space. The SINDy algorithm is applied on the Lorenz system of equations to demonstrate a general understanding of its application.

Posts tagged "API"
Data Scraping and Visualization with Dash
02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and PlotlyThe internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.
Posts tagged "Advanced_Bayesian_Computation"
An Analytical and Algorithmic Description of Metropolis-Hastings and Hamiltonian Monte Carlo Methods
14 Jan 2023 - Markov_chain_Monte_Carlo and Advanced_Bayesian_ComputationThis post provides the analytical and algorithmic details behind the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms.
Implementing Metropolis-Hastings and Hamiltonian Monte Carlo on TensorFlow Probability
19 Dec 2022 - Markov_chain_Monte_Carlo, TensorFlow_Probability, and Advanced_Bayesian_ComputationThis post is the first in a series on Markov chain Monte Carlo. This is a tutorial on implementing the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms using TensorFlow Probability. The main task is to estimate the parameters of a multivariate Gaussian distribution and estimate the posterior predictive distribution. This task was selected because it has a few difficulties that require solutions using TensorFlow Probability's available tools which can cause new users difficulties. Additionally, the analyical results can be compared to the MCMC computations to assure that the algorithms are working as intended.
Posts tagged "Anomaly_Detection"
Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination
26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_MethodsThis post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.
Posts tagged "Bag_of_Words"
Sentiment Classification with the Naive Bayes Algorithm
02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_WordsNatural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.
Posts tagged "Bayesian_Linear_Regression"
Bayesian Linear Regression: Conjugate Analysis and MCMC with Tensorflow
19 Feb 2022 - Tensorflow, Bayesian_Linear_Regression, Markov_Chain_Monte_Carlo, and Conjugate_AnalysisBayesian linear regression differs from point estimates in that the posterior distribution over the parameters and the predictive distribution are computed. This provides an additional component of information because the uncertainty associated with the parameters and estimates made from the model are well-defined. To obtain an analytical solution of the posterior distributions, one is restricted to selecting a class of priors that are known as conjugate priors. This results in a posterior distribution of the same form of the prior and thus the distribution parameters can be computed. When non-conjugate priors are desired to model parameters, the ability to analytically derive the parameters of the posterior is no longer possible. This forces one to be restricted to conjugate priors or explore other methods like Markov chain Monte Carlo to estimate posterior distributions. This post displays how to compute the posterior distribution using a conjugate prior over the model parameters in the scenario in which the variance of a synthetic dataset is assumed to be known. Additionally, the Hamiltonian Markov Chain algorithm will also be implemented using the Tensorflow Probability library to replicate the same results from the conjugate analysis to become familiar with Markov chain Monte Carlo methods and how they may be extended when a conjugate prior is not selected.
Posts tagged "Classification"
Multivariate Gaussian Overview and Applications
16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_InferenceThis post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.
Predict User Advertisement Click Behavior
03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_ImportanceIn this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.
Posts tagged "Conjugate_Analysis"
Bayesian Linear Regression: Conjugate Analysis and MCMC with Tensorflow
19 Feb 2022 - Tensorflow, Bayesian_Linear_Regression, Markov_Chain_Monte_Carlo, and Conjugate_AnalysisBayesian linear regression differs from point estimates in that the posterior distribution over the parameters and the predictive distribution are computed. This provides an additional component of information because the uncertainty associated with the parameters and estimates made from the model are well-defined. To obtain an analytical solution of the posterior distributions, one is restricted to selecting a class of priors that are known as conjugate priors. This results in a posterior distribution of the same form of the prior and thus the distribution parameters can be computed. When non-conjugate priors are desired to model parameters, the ability to analytically derive the parameters of the posterior is no longer possible. This forces one to be restricted to conjugate priors or explore other methods like Markov chain Monte Carlo to estimate posterior distributions. This post displays how to compute the posterior distribution using a conjugate prior over the model parameters in the scenario in which the variance of a synthetic dataset is assumed to be known. Additionally, the Hamiltonian Markov Chain algorithm will also be implemented using the Tensorflow Probability library to replicate the same results from the conjugate analysis to become familiar with Markov chain Monte Carlo methods and how they may be extended when a conjugate prior is not selected.
Posts tagged "Dash"
Data Scraping and Visualization with Dash
02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and PlotlyThe internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.
Posts tagged "Data-Driven_Methods"
Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination
26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_MethodsThis post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.
Introduction to Dynamic Mode Decomposition
16 Nov 2021 - Python, Dynamic_Mode_Decompostion, and Data-Driven_MethodsOriginally developed within the fluid dynamics community, dynamic mode decomposition (DMD) has become a modern, powerful technique used to characterize dynamical systems from high-dimensional data. In the era of big data, the integration with modern scientific computation and machine learning is rapidly increasing the popularity of data-driven approaches like DMD to discover rich insights from complex dynamical systems. These modern data-driven approaches hold a new potential to revolutionize the understanding, predictability, and control of these systems. A brief introduction of the DMD is the topic of this post.
Introduction to Sparse Identification of Nonlinear Dynamics
16 Nov 2021 - Python, Sparse_Identification_of_Nonlinear_Dynamics, Data-Driven_Methods, and Sparse_RegressionThe sparse identification of nonlinear dynamics (SINDy) is a method used to identify governing equations from dynamical systems using measurement data. This method relies on the assumption that many dynamical systems have few contributing terms that exist within high-dimensional nonlinear function space. The SINDy algorithm is applied on the Lorenz system of equations to demonstrate a general understanding of its application.
Posts tagged "Document_Classification"
Sentiment Classification with the Naive Bayes Algorithm
02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_WordsNatural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.
Posts tagged "Dynamic_Mode_Decomposition"
Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination
26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_MethodsThis post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.
Posts tagged "Dynamic_Mode_Decompostion"
Introduction to Dynamic Mode Decomposition
16 Nov 2021 - Python, Dynamic_Mode_Decompostion, and Data-Driven_MethodsOriginally developed within the fluid dynamics community, dynamic mode decomposition (DMD) has become a modern, powerful technique used to characterize dynamical systems from high-dimensional data. In the era of big data, the integration with modern scientific computation and machine learning is rapidly increasing the popularity of data-driven approaches like DMD to discover rich insights from complex dynamical systems. These modern data-driven approaches hold a new potential to revolutionize the understanding, predictability, and control of these systems. A brief introduction of the DMD is the topic of this post.
Posts tagged "Feature_Importance"
Predict User Advertisement Click Behavior
03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_ImportanceIn this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.
Posts tagged "Latent_Variable_Models"
Clustering with Gaussian Mixture Models
07 Feb 2023 - Markov_chain_Monte_Carlo and Latent_Variable_ModelsThis is a brief demonstration on clustering using Gaussian Mixture Models on Tensorflow_Probability
Posts tagged "Linear_Regression"
A Probabilistic Approach to Linear Regression
03 Jan 2022 - Python, Linear_Regression, and RegularizationLinear Regression is a core method used in statistics and machine learning to construct models that can be used for quantitative prediction and parameter inference. The method of least squares is typically used to estimate the model parameters and is implemented by minimizing the residual sum of squares between the model's predicted values and the true values. Although minimizing the residual sum of squares is easily comprehended, its statistical derivation is not often well understood. Understanding the statistical perspective of linear regression is fundamental for understanding model regularization and further topics in machine learning. The statistical perspective of linear regression is explored and implemented on real data.
Posts tagged "Logistic_Regression"
Predict User Advertisement Click Behavior
03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_ImportanceIn this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.
Posts tagged "Markov_Chain_Monte_Carlo"
Bayesian Linear Regression: Conjugate Analysis and MCMC with Tensorflow
19 Feb 2022 - Tensorflow, Bayesian_Linear_Regression, Markov_Chain_Monte_Carlo, and Conjugate_AnalysisBayesian linear regression differs from point estimates in that the posterior distribution over the parameters and the predictive distribution are computed. This provides an additional component of information because the uncertainty associated with the parameters and estimates made from the model are well-defined. To obtain an analytical solution of the posterior distributions, one is restricted to selecting a class of priors that are known as conjugate priors. This results in a posterior distribution of the same form of the prior and thus the distribution parameters can be computed. When non-conjugate priors are desired to model parameters, the ability to analytically derive the parameters of the posterior is no longer possible. This forces one to be restricted to conjugate priors or explore other methods like Markov chain Monte Carlo to estimate posterior distributions. This post displays how to compute the posterior distribution using a conjugate prior over the model parameters in the scenario in which the variance of a synthetic dataset is assumed to be known. Additionally, the Hamiltonian Markov Chain algorithm will also be implemented using the Tensorflow Probability library to replicate the same results from the conjugate analysis to become familiar with Markov chain Monte Carlo methods and how they may be extended when a conjugate prior is not selected.
Posts tagged "Markov_chain_Monte_Carlo"
Clustering with Gaussian Mixture Models
07 Feb 2023 - Markov_chain_Monte_Carlo and Latent_Variable_ModelsThis is a brief demonstration on clustering using Gaussian Mixture Models on Tensorflow_Probability
An Analytical and Algorithmic Description of Metropolis-Hastings and Hamiltonian Monte Carlo Methods
14 Jan 2023 - Markov_chain_Monte_Carlo and Advanced_Bayesian_ComputationThis post provides the analytical and algorithmic details behind the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms.
Implementing Metropolis-Hastings and Hamiltonian Monte Carlo on TensorFlow Probability
19 Dec 2022 - Markov_chain_Monte_Carlo, TensorFlow_Probability, and Advanced_Bayesian_ComputationThis post is the first in a series on Markov chain Monte Carlo. This is a tutorial on implementing the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms using TensorFlow Probability. The main task is to estimate the parameters of a multivariate Gaussian distribution and estimate the posterior predictive distribution. This task was selected because it has a few difficulties that require solutions using TensorFlow Probability's available tools which can cause new users difficulties. Additionally, the analyical results can be compared to the MCMC computations to assure that the algorithms are working as intended.
Posts tagged "Multivariate_Gaussian_Distribution"
Multivariate Gaussian Overview and Applications
16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_InferenceThis post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.
Posts tagged "Naive_Bayes"
Sentiment Classification with the Naive Bayes Algorithm
02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_WordsNatural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.
Posts tagged "Natural_Languange_Processing"
Sentiment Classification with the Naive Bayes Algorithm
02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_WordsNatural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.
Posts tagged "Parameter_Inference"
Multivariate Gaussian Overview and Applications
16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_InferenceThis post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.
Posts tagged "Plotly"
Data Scraping and Visualization with Dash
02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and PlotlyThe internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.
Posts tagged "Python"
Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination
26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_MethodsThis post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.
Sentiment Classification with the Naive Bayes Algorithm
02 May 2022 - Python, Natural_Languange_Processing, Document_Classification, Naive_Bayes, and Bag_of_WordsNatural language processing (NLP) is an important branch of machine learning that is concerned with developing programs that enable computers the ability to process and analyze natual language data. Applications of NLP tasks can be observed in speech recognition, text-to-speech, recommender systems, document classification and summerization along with various other examples. Given the vast sources of available text data, it is important to develop the skills to process and analyze text data. This post explores the Naive Bayes algorithm applied on a document classification task.
Data Scraping and Visualization with Dash
02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and PlotlyThe internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.
Multivariate Gaussian Overview and Applications
16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_InferenceThis post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.
Predict User Advertisement Click Behavior
03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_ImportanceIn this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.
A Probabilistic Approach to Linear Regression
03 Jan 2022 - Python, Linear_Regression, and RegularizationLinear Regression is a core method used in statistics and machine learning to construct models that can be used for quantitative prediction and parameter inference. The method of least squares is typically used to estimate the model parameters and is implemented by minimizing the residual sum of squares between the model's predicted values and the true values. Although minimizing the residual sum of squares is easily comprehended, its statistical derivation is not often well understood. Understanding the statistical perspective of linear regression is fundamental for understanding model regularization and further topics in machine learning. The statistical perspective of linear regression is explored and implemented on real data.
Introduction to Dynamic Mode Decomposition
16 Nov 2021 - Python, Dynamic_Mode_Decompostion, and Data-Driven_MethodsOriginally developed within the fluid dynamics community, dynamic mode decomposition (DMD) has become a modern, powerful technique used to characterize dynamical systems from high-dimensional data. In the era of big data, the integration with modern scientific computation and machine learning is rapidly increasing the popularity of data-driven approaches like DMD to discover rich insights from complex dynamical systems. These modern data-driven approaches hold a new potential to revolutionize the understanding, predictability, and control of these systems. A brief introduction of the DMD is the topic of this post.
Introduction to Sparse Identification of Nonlinear Dynamics
16 Nov 2021 - Python, Sparse_Identification_of_Nonlinear_Dynamics, Data-Driven_Methods, and Sparse_RegressionThe sparse identification of nonlinear dynamics (SINDy) is a method used to identify governing equations from dynamical systems using measurement data. This method relies on the assumption that many dynamical systems have few contributing terms that exist within high-dimensional nonlinear function space. The SINDy algorithm is applied on the Lorenz system of equations to demonstrate a general understanding of its application.
Posts tagged "Quadratic_Discriminant_Analysis"
Multivariate Gaussian Overview and Applications
16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_InferenceThis post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.
Posts tagged "Regularization"
A Probabilistic Approach to Linear Regression
03 Jan 2022 - Python, Linear_Regression, and RegularizationLinear Regression is a core method used in statistics and machine learning to construct models that can be used for quantitative prediction and parameter inference. The method of least squares is typically used to estimate the model parameters and is implemented by minimizing the residual sum of squares between the model's predicted values and the true values. Although minimizing the residual sum of squares is easily comprehended, its statistical derivation is not often well understood. Understanding the statistical perspective of linear regression is fundamental for understanding model regularization and further topics in machine learning. The statistical perspective of linear regression is explored and implemented on real data.
Posts tagged "Regularized_Discriminant_Analysis"
Predict User Advertisement Click Behavior
03 Feb 2022 - Python, Logistic_Regression, Classification, Regularized_Discriminant_Analysis, and Feature_ImportanceIn this post, a dataset comprised of numeric and categorical features will be analyzed to determine if it is possible to predict whether an individiual is likekly to click on an online advertisement.
Posts tagged "Selenium"
Data Scraping and Visualization with Dash
02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and PlotlyThe internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.
Posts tagged "Sparse_Identification_of_Nonlinear_Dynamics"
Introduction to Sparse Identification of Nonlinear Dynamics
16 Nov 2021 - Python, Sparse_Identification_of_Nonlinear_Dynamics, Data-Driven_Methods, and Sparse_RegressionThe sparse identification of nonlinear dynamics (SINDy) is a method used to identify governing equations from dynamical systems using measurement data. This method relies on the assumption that many dynamical systems have few contributing terms that exist within high-dimensional nonlinear function space. The SINDy algorithm is applied on the Lorenz system of equations to demonstrate a general understanding of its application.
Posts tagged "Sparse_Regression"
Introduction to Sparse Identification of Nonlinear Dynamics
16 Nov 2021 - Python, Sparse_Identification_of_Nonlinear_Dynamics, Data-Driven_Methods, and Sparse_RegressionThe sparse identification of nonlinear dynamics (SINDy) is a method used to identify governing equations from dynamical systems using measurement data. This method relies on the assumption that many dynamical systems have few contributing terms that exist within high-dimensional nonlinear function space. The SINDy algorithm is applied on the Lorenz system of equations to demonstrate a general understanding of its application.
Posts tagged "TensorFlow_Probability"
Implementing Metropolis-Hastings and Hamiltonian Monte Carlo on TensorFlow Probability
19 Dec 2022 - Markov_chain_Monte_Carlo, TensorFlow_Probability, and Advanced_Bayesian_ComputationThis post is the first in a series on Markov chain Monte Carlo. This is a tutorial on implementing the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms using TensorFlow Probability. The main task is to estimate the parameters of a multivariate Gaussian distribution and estimate the posterior predictive distribution. This task was selected because it has a few difficulties that require solutions using TensorFlow Probability's available tools which can cause new users difficulties. Additionally, the analyical results can be compared to the MCMC computations to assure that the algorithms are working as intended.
Posts tagged "Tensorflow"
Bayesian Linear Regression: Conjugate Analysis and MCMC with Tensorflow
19 Feb 2022 - Tensorflow, Bayesian_Linear_Regression, Markov_Chain_Monte_Carlo, and Conjugate_AnalysisBayesian linear regression differs from point estimates in that the posterior distribution over the parameters and the predictive distribution are computed. This provides an additional component of information because the uncertainty associated with the parameters and estimates made from the model are well-defined. To obtain an analytical solution of the posterior distributions, one is restricted to selecting a class of priors that are known as conjugate priors. This results in a posterior distribution of the same form of the prior and thus the distribution parameters can be computed. When non-conjugate priors are desired to model parameters, the ability to analytically derive the parameters of the posterior is no longer possible. This forces one to be restricted to conjugate priors or explore other methods like Markov chain Monte Carlo to estimate posterior distributions. This post displays how to compute the posterior distribution using a conjugate prior over the model parameters in the scenario in which the variance of a synthetic dataset is assumed to be known. Additionally, the Hamiltonian Markov Chain algorithm will also be implemented using the Tensorflow Probability library to replicate the same results from the conjugate analysis to become familiar with Markov chain Monte Carlo methods and how they may be extended when a conjugate prior is not selected.
Posts tagged "Time_Series_Forecast"
Dynamic Mode Decomposition Hydrocarbon Predictive Analytics, Anomaly Detection, and Productivity Determination
26 Jul 2022 - Python, Dynamic_Mode_Decomposition, Time_Series_Forecast, Anomaly_Detection, and Data-Driven_MethodsThis post explores the application of the DMD algorithm on a large hydrocarbon production dataset to demonstrate its potential in time-series predictive analytics, anomaly detection, and productivity quantification. The dataset was obtained from Wattenberg field in Colorado and contains production data form over 4,000 active horizontal wells.
Posts tagged "Web_App"
Data Scraping and Visualization with Dash
02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and PlotlyThe internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.
Posts tagged "Web_Scrapping"
Data Scraping and Visualization with Dash
02 Apr 2022 - Python, Web_Scrapping, Selenium, Dash, API, Web_App, and PlotlyThe internet is a rich source of data waiting to be explored for research or personal interests. This data is encountered in a variety of formats which may not always be amenable to analysis and modeling. It is often the case that more meticulous methods are required to gather and structure this data. Fortunately, there are several libraries in Python that are intended to solve this issue and thus unlock these datasets to anyone who takes time to explore these libraries. This post demonstrates a quick use of these python libraries as well as a brief introduction to dynamic data visualization.
Posts tagged "Wishart_Distribution"
Multivariate Gaussian Overview and Applications
16 Mar 2022 - Python, Multivariate_Gaussian_Distribution, Wishart_Distribution, Quadratic_Discriminant_Analysis, Classification, and Parameter_InferenceThis post explores the multivariate Gaussian distribution and some common applications. Given its use in many statistical analyses and machine learning applications, it is important that this distribution is well understood. This will be presented by showing many of the required mathematical derivations associated with the multivariate Gaussian and applications on python.