Trusted by data scientists and data engineers
Automated data source optimizations for ML models:
If properly prompted with context from all relevant external data, an LLM significantly improves the quality of its embeddings for text field in a source.
Open Street Map is an example of graph data source
Thus, if multiple sources with different error distributions are used, their ensemble will have better accuracy. This is similar to a consensus forecast.
If it finds the relevant information, it will automatically add a new search key - in this case, the postal code for each IP. This enables searching through all geo data sources in addition to IP sources.
Upgini Generative AI can automatically enrich any text fields with relevant facts from external data sources and generate ready-to-use numeric features from enriched representations of text fields.
Upgini Gen AI does this in the following steps:
(1) Finds entities in the text to match facts from external data sources. Simple examples are the company name, car model, product title, and place of interest (POI) name.
(2) Detects contextual information for extracted entities from external sources. A simple example is a geographical location for a company.
(3) Generates enriched embeddings for text fields using a process similar to Retrieval-augmented generation (RAG) with facts fetched from external sources and specific contextual information.
These enriched embeddings for text fields can be used as numeric features to enhance the accuracy of downstream ML models.
We want to improve the accuracy of an ML model that predicts the probability of product usage decline for a specific client (what is called an attrition or churn model).
For every client in a labeled training dataset, in addition to numeric ML features, we have transcriptions of all calls to support, the history of the client's chats with support, and purchased product reviews from the website.
Simply pass this text as columns in a labeled dataset during the search process, and Upgini will automatically enrich these columns with relevant external sources and generate Enriched embeddings.
Enriched embeddings from Upgini Generative AI are more accurate than embeddings from top commercial embedders, such as OpenAI’s ada-002.
More details on the comparison can be found in the Medium publication.
Air temperature
Precipitation
Wind
Air pressure
Normals
Sun hours
Moon phase
POI Categories:
Schools, restaurants, hotels, supermarkets, etc
Houses:
Living buldings, business centers, etc
Transport infrustructure:
Roads, public transport stops, etc
Public facilities:
Gov. offices, post office, police, etc
Natural features:
Public parks, green areas, etc
Stats for different distances (1 km / 3 km / 5 km)
Workweek calendars by countries
Public holidays / Observed holidays
Religious holidays
Sporting events
Political events
Consumer Price index
GDP
Сentral Bank Rates
Сommodities prices
Stock prices
Stock volumes
Currencies and exchange rates
Market indexes
Step by step guide
#1
#2
#3
#4