ColossalAI/applications/ColossalQA/colossalqa/utils.py

import re
from typing import Union

from colossalqa.mylogging import get_logger
from sqlalchemy import Engine, MetaData, create_engine
from sqlalchemy.exc import SQLAlchemyError
from sqlalchemy.ext.declarative import declarative_base

logger = get_logger()


def drop_table(engine: Engine) -> None:
    """
    Drop all existing table
    """
    Base = declarative_base()
    metadata = MetaData()
    metadata.reflect(bind=engine)
    for key in metadata.tables:
        table = metadata.tables[key]
        if table is not None:
            Base.metadata.drop_all(engine, [table], checkfirst=True)


def create_empty_sql_database(database_uri):
    try:
        # Create an SQLAlchemy engine to connect to the database
        engine = create_engine(database_uri)

        # Create the database
        engine.connect()

        logger.info(f"Database created at {database_uri}")
    except SQLAlchemyError as e:
        logger.error(f"Error creating database: {str(e)}")
    return engine, database_uri


def destroy_sql_database(sql_engine: Union[Engine, str]) -> None:
    """
    Destroy an sql database
    """
    if isinstance(sql_engine, str):
        sql_engine = create_engine(sql_engine)
    drop_table(sql_engine)
    sql_engine.dispose()
    sql_engine = None


def detect_lang_naive(s):
    """
    Naive function for language detection, should be replaced by an independent layer
    """
    remove_nota = "[’·°–!\"#$%&'()*+,-./:;<=>?@，。?★、…【】（）《》？“”‘’！[\\]^_`{|}~]+"
    s = re.sub(remove_nota, "", s)
    s = re.sub("[0-9]", "", s).strip()
    res = re.sub("[a-zA-Z]", "", s).strip()
    if len(res) <= 0:
        return "en"
    else:
        return "zh"
[Feature] Add document retrieval QA (#5020) * add langchain * add langchain * Add files via upload * add langchain * fix style * fix style: remove extra space * add pytest; modified retriever * add pytest; modified retriever * add tests to build_on_pr.yml * fix build_on_pr.yml * fix build on pr; fix environ vars * seperate unit tests for colossalqa from build from pr * fix container setting; fix environ vars * commented dev code * add incremental update * remove stale code * fix style * change to sha3 224 * fix retriever; fix style; add unit test for document loader * fix ci workflow config * fix ci workflow config * add set cuda visible device script in ci * fix doc string * fix style; update readme; refactored * add force log info * change build on pr, ignore colossalqa * fix docstring, captitalize all initial letters * fix indexing; fix text-splitter * remove debug code, update reference * reset previous commit * update LICENSE update README add key-value mode, fix bugs * add files back * revert force push * remove junk file * add test files * fix retriever bug, add intent classification * change conversation chain design * rewrite prompt and conversation chain * add ui v1 * ui v1 * fix atavar * add header * Refactor the RAG Code and support Pangu * Refactor the ColossalQA chain to Object-Oriented Programming and the UI demo. * resolved conversation. tested scripts under examples. web demo still buggy * fix ci tests * Some modifications to add ChatGPT api * modify llm.py and remove unnecessary files * Delete applications/ColossalQA/examples/ui/test_frontend_input.json * Remove OpenAI api key * add colossalqa * move files * move files * move files * move files * fix style * Add Readme and fix some bugs. * Add something to readme and modify some code * modify a directory name for clarity * remove redundant directory * Correct a type in llm.py * fix AI prefix * fix test_memory.py * fix conversation * fix some erros and typos * Fix a missing import in RAG_ChatBot.py * add colossalcloud LLM wrapper, correct issues in code review --------- Co-authored-by: YeAnbang <anbangy2@outlook.com> Co-authored-by: Orion-Zheng <zheng_zian@u.nus.edu> Co-authored-by: Zian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com> Co-authored-by: Orion-Zheng <zhengzian@u.nus.edu> 1 year ago			`import re`
			`from typing import Union`

			`from colossalqa.mylogging import get_logger`
			`from sqlalchemy import Engine, MetaData, create_engine`
			`from sqlalchemy.exc import SQLAlchemyError`
			`from sqlalchemy.ext.declarative import declarative_base`

			`logger = get_logger()`


			`def drop_table(engine: Engine) -> None:`
			`"""`
			`Drop all existing table`
			`"""`
			`Base = declarative_base()`
			`metadata = MetaData()`
			`metadata.reflect(bind=engine)`
			`for key in metadata.tables:`
			`table = metadata.tables[key]`
			`if table is not None:`
			`Base.metadata.drop_all(engine, [table], checkfirst=True)`


			`def create_empty_sql_database(database_uri):`
			`try:`
			`# Create an SQLAlchemy engine to connect to the database`
			`engine = create_engine(database_uri)`

			`# Create the database`
			`engine.connect()`

			`logger.info(f"Database created at {database_uri}")`
			`except SQLAlchemyError as e:`
			`logger.error(f"Error creating database: {str(e)}")`
			`return engine, database_uri`


			`def destroy_sql_database(sql_engine: Union[Engine, str]) -> None:`
			`"""`
			`Destroy an sql database`
			`"""`
			`if isinstance(sql_engine, str):`
			`sql_engine = create_engine(sql_engine)`
			`drop_table(sql_engine)`
			`sql_engine.dispose()`
			`sql_engine = None`


			`def detect_lang_naive(s):`
			`"""`
[fix] fix typo s/muiti-node /multi-node etc. (#5448) 8 months ago			`Naive function for language detection, should be replaced by an independent layer`
[Feature] Add document retrieval QA (#5020) * add langchain * add langchain * Add files via upload * add langchain * fix style * fix style: remove extra space * add pytest; modified retriever * add pytest; modified retriever * add tests to build_on_pr.yml * fix build_on_pr.yml * fix build on pr; fix environ vars * seperate unit tests for colossalqa from build from pr * fix container setting; fix environ vars * commented dev code * add incremental update * remove stale code * fix style * change to sha3 224 * fix retriever; fix style; add unit test for document loader * fix ci workflow config * fix ci workflow config * add set cuda visible device script in ci * fix doc string * fix style; update readme; refactored * add force log info * change build on pr, ignore colossalqa * fix docstring, captitalize all initial letters * fix indexing; fix text-splitter * remove debug code, update reference * reset previous commit * update LICENSE update README add key-value mode, fix bugs * add files back * revert force push * remove junk file * add test files * fix retriever bug, add intent classification * change conversation chain design * rewrite prompt and conversation chain * add ui v1 * ui v1 * fix atavar * add header * Refactor the RAG Code and support Pangu * Refactor the ColossalQA chain to Object-Oriented Programming and the UI demo. * resolved conversation. tested scripts under examples. web demo still buggy * fix ci tests * Some modifications to add ChatGPT api * modify llm.py and remove unnecessary files * Delete applications/ColossalQA/examples/ui/test_frontend_input.json * Remove OpenAI api key * add colossalqa * move files * move files * move files * move files * fix style * Add Readme and fix some bugs. * Add something to readme and modify some code * modify a directory name for clarity * remove redundant directory * Correct a type in llm.py * fix AI prefix * fix test_memory.py * fix conversation * fix some erros and typos * Fix a missing import in RAG_ChatBot.py * add colossalcloud LLM wrapper, correct issues in code review --------- Co-authored-by: YeAnbang <anbangy2@outlook.com> Co-authored-by: Orion-Zheng <zheng_zian@u.nus.edu> Co-authored-by: Zian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com> Co-authored-by: Orion-Zheng <zhengzian@u.nus.edu> 1 year ago			`"""`
			remove_nota = "[’·°–!\"#$%&'()*+,-./:;<=>?@，。?★、…【】（）《》？“”‘’！[\\]^_`{\|}~]+"
			`s = re.sub(remove_nota, "", s)`
			`s = re.sub("[0-9]", "", s).strip()`
			`res = re.sub("[a-zA-Z]", "", s).strip()`
			`if len(res) <= 0:`
			`return "en"`
			`else:`
			`return "zh"`