January 2023

Freshdesk Connector is Live

  • All Published articles from an end user’s Freshdesk knowledge base are synced when connected to Carbon.

  • The Carbon Connect enabledIntegrations value is FRESHDESK.

  • You can find more info here.

Speed Improvements to Hybrid Search

  • We improved the speed of hybrid search by a factor of 10x by creating sparse vector indexes on file upload vs. query time.

    • Steps to Enable:

      • Pass the following body to the /modify_user_configuration endpoint: { "configuration_key_name": "sparse_vectors", "value": { "enabled": true } }

    • Set the parameter generate_sparse_vectors to true via the /uploadfile endpoint.

  • We’ll be rolling out faster hybrid search support across 3rd party connectors in the upcoming weeks.

  • Find more details here and here.

Deleting Files based on Sync Status

  • You can now delete file(s) based on sync_status via the delete_files endpoint.

  • We added 2 parameters:

    • sync_statuses - parameter to pass a list of sync statuses for file deletion.

      • For example, { "sync_statuses": ["SYNC_ERROR", "QUEUED_FOR_SYNC"] }. When this parameter value is passed we will delete all files in the SYNC_ERROR and QUEUED_FOR_SYNC status that belong to the end user identified by customer-id in headers that made the request.

    • delete_non_synced_only - boolean parameter that limits deletion to files that have not been re-synced before.

      • For example, a previously synced Google Drive file enters the QUEUED_FOR_SYNC status again during a scheduled re-sync. Setting delete_non_synced_only to true would prevent this file from being deleted as well.

  • Files are deletable in all statuses except SYNCING, EVALUATING_RESYNC and QUEUED_FOR_OCR states.  Including SYNCING, EVALUATING_RESYNC, QUEUED_FOR_OCR in the list will result in an error response - files in these statuses must wait until they transition out of the status to be deleted.

  • Find more details here.

Carbon Connect Updates

  • Added support for the following functionalities in Carbon Connect (React component + JavaScript SDK):

    • Additional embedding models (OPENAI, AZURE_OPENAI, COHERE_MULTILINGUAL_V3 for text and audio files, and VERTEX_MULTIMODAL for image files).

    • Enable audio and image file support. Reference documentation on file formats available.

    • OCR support for PDFs from local file uploads via Carbon Connect.

    • Hybrid search supported.

Remove Customer-Id on Select Endpoints

  • We’re removing customer-id as a required header for the following endpoints where it is not required:

    • /auth/v1/white_labeling

    • /user

    • /webhooks

    • /add_webhook

    • /delete_webhook/{webhook_id}

    • /organization

Vector Database Integration

  • We are starting to build out direct integrations with vector database providers!

  • What this means:

    • After authenticating a vector database provider via API key, Carbon automatically synchronizes between user data sources and the embeddings within your vector database. Whenever a user file is processed, we handle the seamless update of your vector database with the latest embeddings.

    • You’ll have full functionality to all our Carbon’s API endpoints, including hybrid search if sparse vector storage is supported by your vector database.

    • Migrations between vector databases is made simple since Carbon provides a unified API to interface with all providers.

  • The first vector database integration we’re announcing is with Turbopuffer. Many more to come!

CARBON

Data Connectors for LLMs

COPYRIGHT @ 2024 JCDT DBA CARBON