April 2024

Support for Solar Embeddings

  • Exciting news! We’ve integrated Upstage’s Solar Embeddings into our platform, offering you a powerful new embedding model on Carbon.

  • To utilize this embedding model, specify the slug SOLAR for embedding_model

  • You can find more details here.

FILE_CREATED for Web Scrape

  • We have expanded the FILE_CREATED webhook events to fire when files are generated from web scraping requests.

IS_RESYNC for FILE_READY Webhook

  • We’ve added a new boolean property additional_information.is_resync to the FILE_READY webhook event.

    • When it is false, the file was synced for the first time.

    • When it is true, the file was already synced previously so the current sync is a re-sync.

Carbon Connect 2.0 Is Exiting Beta

  • Carbon Connect 2.0 is exiting beta by this Friday!

  • This means if you run npm install carbon-connect moving forward and do not specify a version, we’ll install 2.0 by default.

  • If you need help or have any questions moving over to Carbon Connect 2.0, DM me.

Loading Screen for Carbon Connect 2.0 (carbon-connect@2.0.0-beta22)

  • We added a new component level prop loadingIconColor which defines the color of the loader icon. This can be specified using standard CSS color names, or directly as either a Hexadecimal (Hex) code or RGB color values.

Support for Google Drive Shortcuts

  • Users can now seamlessly sync Google Drive shortcuts to reference the files and folders they point to.

    • How It Works:

      • For shortcuts within folders, a file object will be generated. When this shortcut file is synced, it will also synchronize its targeted file separately, though not as a child. Please note, there is no hierarchical relationship between a shortcut and its target.

      • If the shortcut is directly selected from Google’s file picker, a shortcut file object will not be created. Instead, the target will be synced directly.

      • Importantly, the shortcut file itself will not contain any parsed text of chunks. Instead, it acts as a pointer, with the file_metadata.target_external_file_id attribute identifying the file the shortcut targets.

New Webhook Events

  • We’ve introduced 2 additional webhook events to help track file sync statuses:

    • FILE_CREATED: This event is fired when a user queues up a file to be synced for the first time. The body of the webhook will contain a list of file_ids for files that were created in the same upload, and multiple events could fire for the same upload if a lot of files were queued.

    • ALL_UPLOADED_FILES_QUEUED: This event is fired when every single item in an upload has been queued for sync, including all children of folders in an upload. The body will contain the upload’s request_id.

  • Couple notes:

    • Both file_ids and request_ids can be used to filter for the files in /user_files_v2.

    • A request_id is now always generated for an upload to support the ALL_UPLOADED_FILES_QUEUED webhook. Previously, it was only generated by the user (unless you’re using Carbon Connect) and passed to us as a parameter. You may still do that and we’ll use your generated request_id, but if they don’t then we’ll generate an request_id for you on behalf of the user’s upload.

    • These two webhooks currently are supported for 3rd party data sources only. Support for web scrapes and local file uploads will be coming soon.

  • You can find more details here.

GitHub Connector

  • We launched our Github integration today that syncs pages from both public and public repositories.

  • The Carbon Connect enabledIntegration slug for Github is GITHUB. You’ll need to update to 2.0.0-beta19 to access the new screen.

  • Users should first submit their GitHub username and access token to our integration endpoint at /integrations/github. Then you can then use our global endpoints for listing and syncing specific files in different repositories:

    • List files from repositories with the global endpoints /integrations/items/list

    • Sync files from repositories with the global endpoint /integrations/files/sync

  • See more specifics about our Github integration here.

Set Max Files Per Upload

  • A new user-level parameter, max_files_per_upload, has been introduced that can be modified via the /update_users endpoint. It determines the maximum number of files a user can upload in a single request.

    • Files that exceed the maximum number of files will be moved into the SYNC_ERROR status with webhooks being fired to alert you.

  • You can check the file_single_upload_limit set for a particular user via the user endpoint.

  • Find more details here.

  • Important Update: The parameter max_files now serves to establish the overall file upload limit for a user across all uploads.

Add include_all_children to Embeddings Endpoint

  • Added param include_all_children to the embeddings endpoint. When this param is set to true, the search is run over all filtered files as well as their children.

  • Filters applied to the endpoint extend to the returned child files.

In-House File Picker for Confluence and Salesforce

  • We’re excited to introduce our in-house file picker, starting with Confluence and Salesforce. Our in-house file picker is still in beta, but you can test it out by manually running npm install carbon-connect@2.0.0-beta13

  • With this update, end users gain the ability to directly select and upload specific files from Confluence and Salesforce. Previously, this functionality was unavailable as neither platform offered their own dedicated file pickers.

  • When syncFilesOnConnection is set to false then our file picker will be enabled.

  • Here’s a quick walkthrough I recorded.

Hiding 3rd-Party File Picker

  • The endpoints /integrations/oauth_url and /integrations/connect now support a new boolean parameter named enable_file_picker.

    • When enable_file_picker is set to true (default behavior), a button will be displayed on the success page. Clicking this button will open the file picker associated with the respective source. This is the standard behavior.

    • Conversely, setting enable_file_picker to false will hide the file picker button on the success page. In such cases, end users will be directed to use custom or in-house file pickers for file selection.

Sync Outlook and Gmail Attachments

  • We’ve introduced a new property called sync_attachments, which can be specified when syncing via /integrations/gmail/sync and /integrations/outlook/sync endpoints. By default, this property is set to false.

  • Setting sync_attachments to true enables Carbon to automatically sync file attachments from corresponding emails. This includes not only traditional file attachments but also files (such as images) that are added in-line within emails.

  • Each file attachment will be assigned a unique file_id, with the parent_id corresponding to the email the file was attached to.

  • Please note that the same rules that apply to our file uploads also apply to attachments in terms of file size and supported extensions.

Set User File Limits

  • You have the flexibility to set the maximum number of files that a unique customer ID can upload using the file_upload_limit field on the update_users endpoint.

  • This value can be adjusted as needed, allowing you to tailor it according to your own plan limits.

  • Then you can check the upload limit set for a specific user via the custom_limits object on the user endpoint.

  • See details here.

Flags for OCR

  • Added ocr_job_started_at to the user_files_v2 response to denote whether OCR was enabled for a particular file.

  • Added additional OCR properties to be returned via ocr_properties, including whether table parsing was enabled.

  • See details here.

Role Management in Customer Portal

  • You now have the ability to manage who in your organization can create, delete, and view API keys.

  • Here’s a breakdown of the current roles available:

    • Admin: This role is empowered to both create and delete API keys.

    • User: Users with this role can view API keys.

  • Moving forward, these roles will determine user permissions and access across different sections of the Carbon Customer Portal.

  • You can access the customer portal via portal.carbon.ai

Expanded OCR Support in Carbon Connect

  • The prop useOCR can now be enabled on the integration level for the following connectors (in addition to local files):

    • OneDrive

    • Dropbox

    • Box

    • Google Drive

    • Zotero

    • SharePoint

  • The prop parsePdfTablesWithOcr can now be enabled on the integration level to parse tables with OCR when useOCR is set to true.

  • Please note OCR support is only applicable for PDFs at the moment.

  • You can find more details here.

Return chunk_index on the /embeddings Endpoint

  • We now return the chunk_index for specific chunks returned via the /embeddings endpoint.

  • You can find more details here.

Migrations between Embedding Models

  • You can now request migrations between embedding models with minimal downtime.

  • Email me if you’re interested. The cost per migration (not including embedding token costs) starts at $850 one-time.

New request_id Field

  • Carbon now accommodates the inclusion of a request_id within OAuth URLs, global sync endpoints, and custom sync endpoints (such as Gmail, Outlook, etc.), allowing users to define it as needed. Non-OAuth URL endpoints that auto-sync upon connection (e.g., Freshdesk, Gitbook) also supports this value. The request_id serves as a filter for files through user_files_v2.

  • With Carbon Connect, enabling the useRequestIds parameter to true will trigger automatic assignment of the request_id. This request_id will be returned in INITIATE and ADD/UPDATE callbacks.

    • It’s essential to note that this configuration adjustment is applicable at the component level rather than the integration level.

    • This enhancement is part of version 2.0.0-beta8.

    • Find more details here.

CARBON

Data Connectors for LLMs

COPYRIGHT @ 2024 JCDT DBA CARBON