We do federated
data search

We have connected 50+ global public data sources, such as OpenStreetMap, NOAA Weather & Climate data, OECD Consumer Confience Index, OOKLA Speedtest, Mozilla Location Services Cell Coverage.

And more than 150 premium data sources .

01

Upgini checks data quality, cleans the data, and generates new machine learning (ML) features on-the-fly using advanced methods like LLM data augmentation, GraphNN, and RNN.

02

03

These transformations result in more feature candidates that significantly improve the accuracy of your ML models.

No data source needs to be uploaded to Upgini’s infrastructure — all operations are performed within the isolated environments of the data owners. Upgini platform will only handle & pass search results to the end consumers

04

We do federated data search

We do federated
data search

Get started with Python

Get started with Python

Step by step guide
Step 1
Install Upgini library

...from PyPI and check out our documentation on GitHub (it's open-source)

Copied
%pip install upgini
Step 2
Select data enrichment keys and initiate feature search

You can reuse your existing labeled training datasetOnly relevant features that give metric improvement (ROC AUC, RMSE, etc.) returned, not just correlated with the target variable.

Copied
from upgini import FeaturesEnricher, SearchKey
enricher = FeaturesEnricher (
  # Choose one or multiple columns as a search keys
  search_keys={
    'rep_date': SearchKey.DATE,
    'country': SearchKey.COUNTRY,
    'post_code': SearchKey.POSTAL_CODE,
    'hem': SearchKey.HEM,
    'email': SearchKey.EMAIL,
    'ip_addr': SearchKey.IP,
    'phone_num': SearchKey.PHONE
  },
  # Select columns for automated feature generation
  generate_features = ['put_your_text_features_here'],
  api_key = 'put_your_api_key_here',
)
# Run search
enricher.fit(X_train, y_train)
Copied
from upgini import FeaturesEnricher, SearchKey
enricher = FeaturesEnricher (
  # Choose one or multiple columns as a search keys
  search_keys={
    'rep_date': SearchKey.DATE,
    'country': SearchKey.COUNTRY,
    'post_code': SearchKey.POSTAL_CODE,
  },
  api_key = 'put_your_api_key_here',
)
# Run search
enricher.fit(X_train, y_train)
Step 3

Enrich ML model with new features and retrain

10-25% accuracy improvement to baseline results from mainstream AutoML frameworks

Copied
# enrich dataset with external features
enriched_featurespace = enricher.transform(enrich)
enriched_featurespace.head()
Step 4
Add external features into production ML pipeline

Enrich production datasets with actual features/data for the present time

Copied
enricher = FeaturesEnricher(
  #same set of a search keys as for the fit step
  search_keys = {
    "date": SearchKey.DATE
  },
  search_id = "abcdef00-0000-0000-0000-999999999999"
)
enriched_prod_dataframe = enricher.transform(input_dataframe)