Upgini • Data search & enrichment for Machine Learning

More relevant data sources and features significantly boost the accuracy and robustness of ML and AI models. And your internal data is just a tiny part of what’s relevant.

01

Revolutionary simplified search & enrichment with only relevant ready-to-use ML features in just 20 minutes, eliminating weeks of manual data sourcing and tests.

02

03

Automated data cleaning and feature engineering from external sources with Generative AI, removing the need for data engineering or ETL.

Solves common pitfalls with external data in ML pipelines by selecting only features which don't drift over time and improve the accuracy of the ML model, not just correlated with the target variable.

04

Easily integrates with production ML pipelines via batch & RESTful APIs.

05

Generative AI for enrichment & automated feature engineering

Generative AI
for enrichment & automated feature engineering

How this works

Upgini GenAI enriches any text fields with facts from external data sources and generates more accurate embeddings as ready-to-use numeric features for downstream ML models:

Identifies entities like product titles or places of interest (POI) names in the text to match facts from the sources.

01

02

Finds facts for extracted entities, such as POI descriptions and footfall traffic statistics.

03

Generates enriched embeddings for text fields with facts fetched from external sources.

Example

We want to improve the accuracy of an ML model that predicts the probability of product usage decline for a specific client (what is called an attrition or churn model).

For every client in a labeled training dataset, in addition to numeric ML features, we have transcriptions of all calls to support, the history of the client's chats with support, and purchased product reviews from the website.

Simply pass this text as columns in a labeled dataset during the search process, and Upgini will automatically enrich these columns with relevant external sources and generate Enriched embeddings.

Accuracy

Enriched embeddings from Upgini Generative AI are more accurate than embeddings from top commercial embedders, such as OpenAI’s ada-002.

Connected data sources

Connected
data sources

200+ Public, Community and Premium sources 239 countries 40 years of data history

Tag template

machine learning, data enrichment, ML features, data sources, Upgini, AI, external data, geolocation, demographics, consumer confidence, economic indicators, real estate data

No results

Please try different keywords

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Frequently asked questions

Frequently asked
‍questions

I don't see the data source I need. Can you add that?

Yes, we are open to adding new data sources based on our clients' needs. Please reach out to our customer support team with support request, and we will do our best to accommodate it.

How often is data from the sources updated?

The frequency of data updates depends on the specific data source. Some sources are updated daily, others weekly or monthly. You can always check update frequency for a data source right in a search results.

I can't share client IDs like phone numbers or emails with third parties in a labeled dataset. Does Upgini support hashed individual IDs as search keys?

Yes, we understand and prioritize the importance of data privacy. Upgini supports the use of hashed individual IDs as search keys, such as SHA256 hashed email. This allows us to enrich your data without exposing sensitive client information.

Does your search algorithm check for data/signal leakage during the search process to avoid "signals from the future"?

Absolutely. Our search algorithm is designed to prevent data leakage, simply pass the date or date+time as one of the search keys in a labeled dataset. This approach ensures that only information known and available at the specified date is used in the ML & AI model training process, effectively avoiding future signals or data leakage.

Do you provide SLAs for the service and data accuracy?

Yes, we provide Service Level Agreements to our clients. These agreements include commitments to service uptime and data distribution drift. For more details, please contact our customer support team

Make ML&AI models happier
with more relevant data

No credit card required. No time limit on Free plan

Get started

Supercharge ML Accuracy with 200+ Data Sources

Why use Upgini

Why use Upgini

Generative AI for enrichment & automated feature engineering

Generative AI
for enrichment & automated feature engineering

Connected data sources

Connected
data sources

Frequently asked questions

Frequently asked
‍questions

I don't see the data source I need. Can you add that?

How often is data from the sources updated?

I can't share client IDs like phone numbers or emails with third parties in a labeled dataset. Does Upgini support hashed individual IDs as search keys?

Does your search algorithm check for data/signal leakage during the search process to avoid "signals from the future"?

Do you provide SLAs for the service and data accuracy?

Make ML&AI models happier
with more relevant data

Supercharge ML Accuracy with 200+ Data Sources

Why use Upgini

Why use Upgini

Generative AI for enrichment & automated feature engineering

Generative AI for enrichment & automated feature engineering

Connected data sources

Connecteddata sources

Activity Data

Usage Data

Accounts Availability Data

Telecom data

Historical Weather & Climate Normals

Location Data & POIs from OpenStreetMap

Frequently asked questions

Frequently asked‍questions

I don't see the data source I need. Can you add that?

How often is data from the sources updated?

I can't share client IDs like phone numbers or emails with third parties in a labeled dataset. Does Upgini support hashed individual IDs as search keys?

Does your search algorithm check for data/signal leakage during the search process to avoid "signals from the future"?

Do you provide SLAs for the service and data accuracy?

Make ML&AI models happier with more relevant data

Boost your data business with Upgini

Get a free PoC and a custom quote

Generative AI
for enrichment & automated feature engineering

Connected
data sources

Frequently asked
‍questions

Make ML&AI models happier
with more relevant data