Tables and Indices
Tables
Tables are the base structure for storing data in AtlasDB. Every table has a name, a row specification, a column-value specification, an optional set of constraints on the table, and an optional set of behavior and performance tuning parameters.
There is one main way to add a table to the schema, along with two variants.
schema.addTableDefinition("table_name_here", new TableDefinition() {{
javaName("JavaTableName"); //optional
rowName();
...
columns(); //or dynamicColumns();
...
constraints(); //optional section
...
... //behavior/perf options
}});
The addTableDefinition
method takes two arguments: the name of the
table to be used in the key-value store itself, and a table definition.
The table name should be specified in snake_case. The details of the
TableDefinition initialization will be covered in the “Table/Index
Definition” section.
If there are multiple tables which will have the same definition but will have different names, the first variation of the table definition can be used:
schema.addDefinitionForTables(ImmutableSet.of("table1_name", "table2_name"), new TableDefinition() {{
...
}});
Indices
A common pattern in database schemas is to define an index table whose values are derived from and kept in sync with values from a base table. In standard RDBMS’s these are user-defined and db-managed, but AtlasDB is not so full-featured requires you to think more carefully about performance.
There are two kinds of indices which can be defined in AtlasDB: additive and cell-referencing. Both use the dynamic columns layout. For additive indices, each cell in the index is derived from a unique one row in the base table. For cell-referencing indices, each cell in the index is derived from a unique cell (not row of cells) in the base table. For more complicated index situations (e.g. indices whose rows are derived from multiple tables) a regular table must be defined for the index, and synchronization between the base table(s) and index must be done manually.
schema.addIndexDefinition("index_name_here", new IndexDefinition(IndexType.ADDITIVE /* or .CELL_REFERENCING */) {{
onTable("base_table_name");
onCondition("source_column", " /* java boolean expression */ _value > 100 "); //optional
rowName();
...
dynamicColumns(); //or noColumns();
...
... //behavior/perf options
}});
Note that, in the case where the index should only get a row from the
base table if some condition is met, the onCondition
clause can be
added to the index definition. The value of the cell with the specified
column is accessed by the _value
term.
If multiple indices should be defined for the same index definition, then the following variant can be used:
schema.addAdditiveIndexesForDefinition(ImmutableSet.of("index1_name", "index2_name"), new IndexDefinition(...) {{
...
}});
The AtlasDB Developers however strongly recommend against usage of this form, since they have not found it to be particularly useful in making AtlasDB queries, and thus have never used it themselves, and thus have never tested to see if it actually works.
Additive
The components of a cell in an additive index can reference any cells for each row of the base table. Insertions of new rows to the base table and updates to existing rows in the base table will automatically trigger updates to the values in the index tables. Deletes to the base table however are not cascaded to the index tables, and must be done manually. Manual additions, updates, and deletes to an additive index table do not trigger actions on the base table. Additive indices will have a “_aidx” suffix added to their index names.
Cell-Referencing
The components of a cell in a cell-referencing index can only reference cells from a schema-specified column for each row of the base table. Insertions and updates of cells to the base table with the correct column will automatically trigger insertions/updates to corresponding cells in the index table. Deletes to the base table are also cascaded to the index table. Manual additions, updates, and deletes to the cell-referencing table do not trigger actions on the base table. Cell-referencing indices have a “_idx” suffix added to their index names.
Note however that this automatic management comes at a performance cost: Writing to a table performs a read from each cell-referencing index (as well as a write) to determine what deletions need to be performed on the index. This read happens synchronously. This can cause writes (which are otherwise asynchrounous and batched) to be particularly expensive in tables with cell-referencing indices.
Regular tables
Technically there exists a third type of index, which is simply a regular table with data derived from another table. In this case the row name is a tuple composed of the field on which you’re indexing and the primary key (i.e., row name) of the table from which the index table is derived. To look something up, the client can simply do a range scan of rows in the index the first component of which is what it’s looking for. The client then gets back the row names that include that row component, and from there it can look at the remaining components (typically some sort of ID) to find what it’s looking for.
The disadvantage of creating such an index is that inserts, updates, and deletes to the table from which the index table is derived all have to be accounted for manually with extra logic.
However, there is an advantage to indexing this way: The client can page through the results, whereas additive or cell-referencing indices have dynamic columns by default, forcing the client to get all of the results at once. This may or may not be an issue depending on the amount of data being stored in a given table.
Table/Index Definitions
The TableDefinition
and IndexDefinition
is often created as an
anonymous class using “double-brace initialization”, which allows for
more readable code than a conventional builder. Certain initialization
methods, such as javaName()
, can be called at any time in the
method, while others, such as column()
, need to be preceded by a
“state transition” command, such as columns()
. By convention, such
definitions are broken down into the following sections:
Definition Parameters such as
javaName()
define basic properties of tables and indexes and are placed at the beginning.Row Definitions such as
rowComponent()
define the rows of the table. The section is begun with arowName()
call. A table must define at least one row component through these methods to be valid.Named Column Definitions such as
column()
define the named columns of the table. The section is begun with acolumns()
call. A table can have a named column section or a dynamic column section, but not both.Dynamic Column Definitions such as
columnComponent()
define the dynamic columns of the table. The section is begun with adynamicColumns()
call. A table can have a named column section or a dynamic column section, but not both.Enabling the V2 Table API by setting the
enableV2Table()
flag. This would generate an additional table class with some easy to use functions such asputColumn(key, value)
,getColumn(key)
,deleteColumn(key)
. We only provide these methods for named columns, and don’t currently support dynamic columns.Constraint Definitions such as
tableConstraint()
define constraints on the table (such as foreign key relations). The section is begun with aconstraints()
call. This section is optional.Behavioral Parameters such as
conflictHandler()
define the behavior of the table during run-time. This includes allowed queries, performance optimizations, and concurrency strategies, among others. This section is usually at the end.
Definition Parameters
public void javaTableName(String name);
This method specifies the name of the table to be used in generated java code for the schema. It should be specified in CamelCase and be as long as descriptive as is useful. If this method is not called, the value will be derived by converting the table’s AtlasDB name from snake_case to CamelCase.
Logging Parameters
public void tableNameLogSafety(LogSafety logSafety);
If called, this marks the table name as either safe or unsafe
for logging, depending on the argument passed. When AtlasDB logs
a table reference for this table, this will be logged as a SafeArg
or UnsafeArg
respectively, following the Palantir
safe-logging library.
If this is not specified, the table name defaults as UNSAFE.
public void namedComponentsSafeByDefault();
If called, then row components and named columns that are subsequently defined for this table will be assumed to be safe for logging unless specifically indicated as unsafe. By default, row components and named columns are assumed unsafe unless specifically indicated as safe.
Note that specifying named components as safe by default does not also make the table name considered safe.
public void allSafeForLoggingByDefault();
If called, this marks the table name as safe, and all named components as
safe unless they are explicitly marked as unsafe. Note that an exception
will be thrown if this method is called alongside tableNameLogSafety(LogSafety.UNSAFE)
;
to achieve that effect (table names unsafe, but all row/column components safe),
please use namedComponentsSafeByDefault()
instead.
Index-specific Parameters
public void onTable(String name);
This method specifies the AtlasDB name of the table which this index definition will derive its data from. This method is required for all IndexDefinitions.
public void onCondition(String sourceColumn, String booleanExpression);
Optional parameter. Specifies that only rows which satisfy the specified
boolean expression on the specified source column will be added to the
index. The source column must be a valid component name from the source
table, and the boolean expression must be a valid java expression, with
_value
denoting the value of the source column.
Row Definitions
Each row is uniquely identified by its rowName
. Each rowName
is
composed of at least one rowComponent
. Therefore each row is
uniquely identified by the permutation of its rowComponent
values.
Order matters. For example,
rowName();
rowComponent("object_id", ValueType.FIXED_LONG);
rowComponent("group_id", ValueType.VAR_LONG); partition(GROUP_PARTITIONER);
rowComponent("fragment_version_id", ValueType.VAR_LONG);
This means that each row in this table is uniquely identified by a 3-tuple consisting of an object ID, a group ID, and a fragment version ID.
Only the last rowComponent
of a rowName
can be set to
ValueType.STRING
or to ValueType.BLOB
because values of these
types do not explicitly or implicitly track their own size. If you need
a rowComponent
other than the last one to be a string or a byte
array, use ValueType.VAR_STRING
or ValueType.SIZED_BLOB
instead.
See the ValueTypes section for more information
public void rowComponent(String componentName, ValueType valueType, ValueByteOrder valueByteOrder = ValueByteOrder.ASCENDING);
By default, all rows are stored in ascending byte order. This means
range results are iterated in ascending order. If you need to access
rows in reverse order, then adding the ValueByteOrder.DESCENDING
argument will store them in descending order instead.
public void rowComponent(String componentName, ValueType valueType, ValueByteOrder valueByteOrder, LogSafety logSafety = LogSafety.UNSAFE);
You may also identify a row component as being explicitly safe or unsafe for logging. (If this is not specified it defaults to unsafe, or safe if the table was set to default components as being safe.)
Warning
You may define an arbitrary number of row components. However, for compatibility with key-value-services where cell sizes are restricted, AtlasDB enforces a maximum length of Cell.MAX_NAME_LENGTH (= 1500) bytes on row names. Please ensure that your rows will remain within that size for all possible inputs - be especially careful with component values that users may be able to define arbitrarily.
Partitioners
Each row component, after being defined, may then have a partitioner specified on them. The partitioner is responsible for split the space of possible values for each row component into a number of ranges. Rows are then partitioned according to the row component ranges; rows which fall into the same partition are stored in the same server shard. Performance is optimized by putting rows often accessed together in the same partition, while spreading rows equally across all partitions.
public void partition(RowNamePartitioner... partitioners);
public ExplicitRowNamePartitioner explicit(String... componentValues);
public ExplicitRowNamePartitioner explicit(long... componentValues);
public UniformRowNamePartitioner uniform();
By default, all row components use partition(uniform())
. If however,
certain values are certain to be stored/access very often (the group ids
of the objects, in the above example), they can have partitions
explicitly created for them by specifying explicit(...)
. Note that
use of partition()
assumes the order storage of rows; if there is no
good way to partition the rows uniformly and range requests are not
needed, then hashing the first (or first-N) row components of
your table would likely be a good idea.
Warning
The most significant component of any table is used by the partitioner to distribute data across the cluster. To avoid hot-spotting, the type of the first row component should NOT be a VAR_LONG, a VAR_SIGNED_LONG, or a SIZED_BLOB.
For a safe data distribution the usage of hashFirstRowComponent()
is suggested.
rowName();
hashFirstRowComponent();
rowComponent("secondary_row_component_of_any_type", ValueType.VAR_LONG);
Also, in the event that the first row component may not be sufficient for even
distribution (e.g. it has low cardinality and/or an uneven distribution, but subsequent
components are more varied), AtlasDB also offers hashing a prefix of the row key, via
hashFirstNRowComponents(int)
. This is useful, for instance, in stream stores.
rowName();
hashFirstNRowComponents(2);
rowComponent("first_component_not_evenly_distributed", ValueType.VAR_LONG);
rowComponent("second_component_fairly_distributed", ValueType.UUID);
rowComponent("third_component_maybe_expensive_to_hash", ValueType.BLOB);
This will prepend a hash of the first and second components of each row key to the table. Naturally, as hashing involves some overhead, please choose as few components as needed that will still ensure reasonable distribution.
Table Named Columns
For tables using the named columns layout, the column name and value type referenced by each column is specified by a single command.
public void column(String columnName, String shortName, ValueType valueType)
public void column(String columnName, String shortName, Class<?> protoOrPersistable, Compression compression = Compression.NONE)
public void column(String columnName, String shortName, Class<?> protoOrPersistable, Compression compression, LogSafety logSafety = LogSafety.UNSAFE)
The column name is the name of the column that will be used in the
generated java code and table metadata. The short name is a one or two
character label which will be the actual name for the column when stored
in the underlying database. Any ValueType may be used as the value for a
column, as well as any protobuffer class or Persistable. AtlasDB will handle
serializing and deserializing the proto/persistable to and from its byte
array representation, and will optionally also compress the byte array to
save space using the method you specify. Columns can not be overloaded with
multiple types - each column()
call must contain unique column names
and short names.
Also, you may explicitly identify the name of this column to be safe or unsafe for logging. We don’t currently support having different safety levels for the column name and the short name. (If this is not specified it defaults to unsafe, or safe if the table was set to default components as being safe.)
If instead you don’t need a row to have multiple columns and all
table information can be encapsulated in the row components, then the
section can instead be specified with noColumns()
, which defines the
table to contain a single column “exists” with short name “e” and value
type VAR_LONG (always zero).
Table Dynamic Columns
For dynamic columns, the name-value components that make up the column and the value type referenced by columns are specified by separate commands.
public void columnComponent(String componentName, ValueType valueType, ValueByteOrder byteOrder = ValueByteOrder.ASCENDING)
Each column component is made up of a component name (specified in snake_case), a value type, and optionally a byte ordering. Column components for dynamic columns must be primitive ValueTypes which can be partitioned and ordered. The order of the column component calls is the order in which the components will be stored together. Since dynamic columns of a row are retrieved in byte order, this means column component ordering determines sort ordering for retrieval.
public void value(ValueType valueType)
public void value(Class<? extends AbstractMessage> proto, Compression compression = Compression.NONE)
Every dynamic column will also have a value associated with it, which can be a primitive ValueType or protobuf (optionally compressed).
If values are not needed for the table, specifying
value(ValueType.VAR_LONG)
and maxValueSize(1)
is conventional.
The max value size command is a performance hint for AtlasDB.
Warning
You may define an arbitrary number of dynamic column key components. However, for compatibility with key-value-services where cell sizes are restricted, AtlasDB enforces a maximum length of Cell.MAX_NAME_LENGTH (= 1500) bytes on column names. Please ensure that your dynamic column keys will remain within that size for all possible inputs - be especially careful with components that users may be able to define arbitrarily.
Index Rows and Columns
Indices are a little more constrained in their definition than tables,
since their components must be primitive value-types derived from the
pre-existing components of a table. All index definitions also do not
get a choice between named and dynamic column types: If the index is
defined with columns, then it is dynamic; if defined without columns,
this it is named with an implicit noColumns()
.
Both the rowName()
and dynamicColumns()
sections use the same
methods to define their components:
public void componentFromRow(String componentName,
ValueType valueType,
ValueByteOrder byteOrder = ValueByteOrder.ASCENDING,
String sourceComponentName = componentName);
public void componentFromColumn(String componentName,
ValueType valueType,
ValueByteOrder byteOrder = ValueByteOrder.ASCENDING,
String sourceColumnName = componentName,
String codeToAccessValue);
public void componentFromDynamicColumn(String componentName,
ValueType valueType,
ValueByteOrder byteOrder = ValueByteOrder.ASCENDING,
String sourceComponentName = componentName);
All components define a component name, value-type, byte-order, which defaults to ascending if unspecified, and component name of the source row/column component, which by default is assumed to be the same as the component name specified earlier. Note that for cell-referencing indexes, all index components derived from columns need to reference the same column.
For components being derived from named columns, an additional “code to
access value” argument is required. This argument allows value-type
index components to be extracted from more complicated protobuf or
serializable column components. The argument must be a valid java source
code expression, where _value
is the value of the table component.
For the row definitions section, each componentFromRow
call can be
succeeded by a partitioner()
call, in the exact same manner as for
table rows. For more information, see the Partitioners subsection of
Table Rows.
Note
Internally, index rows are stored including a reference to the source column,
but this is stripped out in the generated code before being returned to the user.
Thus, if one uses a List
of results returned from an index table (e.g. through getRowColumns
,
one may encounter multiple values that appear to be the same). The standard workaround is to use
a Set
to deduplicate the results.
Please see discussion on issue 604 for more details regarding this behaviour.
Constraints
Sometimes the set of valid values for a table is smaller than the set of valid values specified by just type information. In these cases, it can be useful to explicitly express these constraints when defining the tables to ensure that code written against these tables do not violate them. The AtlasDB schema allows three different types of constraints to be defined: Foreign key constraints, table constraints, and row constraints. Note however, that these constraints are used mostly for staging and debugging environments only and will usually not be enabled in production due to their sometimes prohibitive performance costs.
public void foreignKeyConstraint(ForeignKeyConstraingMetadata constraint);
public void tableConstraint(TableConstraint constraint);
public void rowConstraint(RowConstraintMetadata constraint);
Foreign Key Constraints reach across tables and specify that certain components must have matching values with components from another table.
Table Constraints affect the whole table and at present define immutability constraints for those tables. See the javadoc for TableConstraint for details.
Row Constraints define constraints for which each table row must satisfy. These operations can be specifying that certain components must be nonnegative, or that certain row components and column components must contain the same value.
Behavioral Parameters
public void conflictHandler(ConflictHandler handler = ConflictHandler.RETRY_ON_WRITE_WRITE);
The conflict handler parameter specifies the MVCC transaction semantics for the table.
public void cachePriority(CachePriority priority = CachePriority.WARM);
Specifies the retention policy for caching AtlasDB queries and their results. Values are COLDEST, COLD, WARM, HOT, HOTTEST. The hotter the setting, the more queries and the longer they are stored.
public void explicitCompressionRequested();
Cassandra only - specifies whether the table should be stored compressed.
public void rangeScanAllowed();
Specifies whether a table should be allowed to have range scans conducted on its rows.
Note
If this option is not selected, you will not be able to use the getRange operation against your table!
public void negativeLookups();
Cassandra only - if certain queries are expected to regularly search for non-existent rows, this will have cassandra create bloom filters on the rows to speed up the search.
public void maxValueSize(int size);
Performance hint - specifies the size in bytes of the largest value which any given row in the table may hold.