We have connected 50+ global public data sources, such as OpenStreetMap, NOAA Weather & Climate data, OECD Consumer Confience Index, OOKLA Speedtest, Mozilla Location Services Cell Coverage.
And more than 150 premium data sources .
01
Upgini checks data quality, cleans the data, and generates new machine learning (ML) features on-the-fly using advanced methods like LLM data augmentation, GraphNN, and RNN.
02
03
These transformations result in more feature candidates that significantly improve the accuracy of your ML models.
No data source needs to be uploaded to Upgini’s infrastructure — all operations are performed within the isolated environments of the data owners. Upgini platform will only handle & pass search results to the end consumers
04
We have connected 50+ global public data sources, such as OpenStreetMap, NOAA Weather & Climate data, OECD Consumer Confience Index, OOKLA Speedtest, Mozilla Location Services Cell Coverage.
And more than 150 premium data sources .
01
Upgini checks data quality, cleans the data, and generates new machine learning (ML) features on-the-fly using advanced methods like LLM data augmentation, GraphNN, and RNN.
02
03
These transformations result in more feature candidates that significantly improve the accuracy of your ML models.
No data source needs to be uploaded to Upgini’s infrastructure — all operations are performed within the isolated environments of the data owners. Upgini platform will only handle & pass search results to the end consumers
04
from upgini import FeaturesEnricher, SearchKey
enricher = FeaturesEnricher (
# Choose one or multiple columns as a search keys
search_keys={
'rep_date': SearchKey.DATE,
'country': SearchKey.COUNTRY,
'post_code': SearchKey.POSTAL_CODE,
'hem': SearchKey.HEM,
'email': SearchKey.EMAIL,
'ip_addr': SearchKey.IP,
'phone_num': SearchKey.PHONE
},
# Select columns for automated feature generation
generate_features = ['put_your_text_features_here'],
api_key = 'put_your_api_key_here',
)
# Run search
enricher.fit(X_train, y_train)
from upgini import FeaturesEnricher, SearchKey
enricher = FeaturesEnricher (
# Choose one or multiple columns as a search keys
search_keys={
'rep_date': SearchKey.DATE,
'country': SearchKey.COUNTRY,
'post_code': SearchKey.POSTAL_CODE,
},
api_key = 'put_your_api_key_here',
)
# Run search
enricher.fit(X_train, y_train)
10-25% accuracy improvement to baseline results from mainstream AutoML frameworks
# enrich dataset with external features
enriched_featurespace = enricher.transform(enrich)
enriched_featurespace.head()
Enrich production datasets with actual features/data for the present time
enricher = FeaturesEnricher(
#same set of a search keys as for the fit step
search_keys = {
"date": SearchKey.DATE
},
search_id = "abcdef00-0000-0000-0000-999999999999"
)
enriched_prod_dataframe = enricher.transform(input_dataframe)
Talk to a team of experts awarded Gartner Cool Vendor status, driving AI & ML data monetization innovations since 2015