Mughilan Muthupari

Mughilan Muthupari

Data Scientist

This page still in progress. However, you can download a full PDF version of the resume using the button below. Mughilan (Mughil) is from Boyds, Maryland, USA. He has a M.S. in Data Science from Columbia University, and a B.S. Double Major in Computer Science and Statistics from University of Maryland - College Park. He is currently working as an ML Engineer at Meta. Before that, he has worked for the Battelle Memorial Institute as a Data Scientist and at FINRA as a Staff Developer. His industry experience revolves around machine learning and building deep learning models. Domains and types of networks include transformer models for document classification, and image recognition on biological images.

Location
50 Christopher Columbus Dr, 07302, Jersey City, New Jersey, USA
Email
Phone
(240) 779-6852
Website
www.linkedin.com/in/mughil-pari/
LinkedIn
Mughil Pari

Experience

present

ML Engineer at Meta - Social Impact Ecosystems (SIE)

The Social Impact Ecosystems (SIE) group at Meta is part of the larger Central Social Impact (CSI) umbrella. The central group focuses on decreasing prevalence on a number of social and sensitive issues, most notably teen safety, along with elections, health, and misinformation. This group focuses on the entire family of apps - Facebook, Instagram, and Threads.

Highlights

  • Part of the SIE Ranking and Recommendations Team, which primarily focuses on developing ranking algorithms in order to reduce exposure of violating content to teens.

Data Scientist II at Battelle Memorial Institute

Battelle Memorial Institute (BMI) is a science and technology non-profit that performs a lot of contractual work for the government. Mughil was a contractor for Battelle for 6 months until late September, and is currently a full-time employee at Battelle.

Highlights

  • Developing an image recognition model, along with interpretability processes to examine spectral k-mer correlation maps to extract biological pathways in rice and corn
  • Modeled evolving bioplumes using graph neural networks and general multi-layer perceptron (MLP) networks
  • Researched using clustering and recurrent neural networks to detect patterns in human-generated random digits for potential use in an authentication system

Developer at FINRA

Financial Industry Regulatory Authority (FINRA) is a broker-dealer overseer that manages and regulates all the brokerages in the United States. Mughil was a contractor for FINRA for approximately two and a half years until October 2021, and then transitioned to full-time after that.

Highlights

  • Worked with explainability techniques for risk predictions such as Lime and Shap. Communicated to stakeholders about analysis results as Team Lead
  • Developed and maintained an advert ML classification system that used CNNs as the main model. Other various transformer models such as BERT and Longformer were also researched and tested.
  • Prototyped data pipelines and storage in S3 using Apache Spark (PySpark)
  • Prototyped abstractive data summarization using the BERT LLM from Huggingface
  • Utilized CNNs to archive U5 filings to improve efficiency. AWS Sagemaker was tested for a viable ML training system.

Software Intern at Moody's Investors Service

A summer internship at Moody’s Investors Service.

Highlights

  • Developed an NLP NER pipeline that implements a combination of weak supervision and active learning to support a large-scale human labeling effort. Labor costs were reduced by about 70%.
  • Heuristic labeling functions were written Snorkel, and human labels were gathered using an annotation tool.

Research Intern at NASA Center for Climate Simulations

A summer internship at the Center for Climate Simulations at NASA which revolves around analyzing high-volume and high-frequency temperature data. Different analytic systems were compared and examined for anomalies.

Highlights

  • Used the Advanced Data Analytics Platform (ADAPT) to examine historic daily temperature cycle and perform other statistical calculations.
  • Compared calculations across different systems to detect discrepancies, and presented findings.
  • Presented findings at poster symposium

Education

Master's in Data Science from Columbia University

Courses

  • CSOR4246 - Algorithms for Data Science
  • IEOR4722 - Stochastic Controls and Financial Applications
  • STAT5702 - Exploratory Data Analysis and Visualization
  • COMS4995 - Applied Deep Learning
  • FINC8306 - Capital Markets and Investments
  • INAF6506 - Data Science and Public Policy
  • STAT5703 - Statistical Inference and Modeling
  • IEOR4735 - Structured and Hybrid Products
  • MATH5220 - Quantitative Methods and Investments

Bachelor's in Computer Science and Statistics from University of Maryland - College Park

Publications

Where's the Learning in Representation Learning for Compositional Semantics and the Case of Thematic Fit by EMNLP Blackbox

A series of thematic fit NLP deep learning experiments to examine where in a neural network does the actual learning take place. Is it in the word embeddings? In the body of the network? Or maybe we even the output layer?

An Immersive Experience: Visualizing Large-Scale Climate Data using Virtual Reality and Infrared Hand-Tracking Technology by UMD Gemstone

A novel way to visualize large climate datasets using virtual reality and infrared sensors, as current visualization technology is too outdated and unintuitive to make swift climate decisions and analyses.

Skills

AI/ML
Level: Master
Keywords:
  • machine learning
  • deep learning
  • image recognition
  • image segmentation
  • time series
  • natural language processing
  • reinforcement learning
  • transformers
  • CNN
  • GNN
  • autoencoder
  • model interpretability
MLOps and Technologies
Level: Master
Keywords:
  • PyTorch
  • Tensorflow
  • PyTorch Lightning
  • MLFlow
  • Weights and Biases
  • data drift
  • Apache Spark
  • data pipelines
  • Sagemaker
  • Jenkins
  • Github Actions
  • PostgreSQL
Cloud Technologies
Level: Master
Keywords:
  • AWS
  • GCP
  • S3
  • EC2
  • CloudWatch
Programming Languages
Level: Master
Keywords:
  • Python
  • R
  • SQL
  • Java